# Description

This notebook shows an example usage of the functions used for the assignment.

The assignment asks to calculate basic statistics and correlations for the data about the number of fire events in an area, number of people living in an area, cadastral area of an area and the number of alcohol selling companies. A natural hypothesis is that all those values are positively correlated. The correlation matrices are calculated at the end of this notebook, as well as possible to be saed as csv files, as requested.

The assignment also asks to choose a correlation different than the three mentioned in the pdf. This project calculates the correlations specified as well as the correlations between cadastral area set and the other values. The calculations are done at both powiat and voivodeship levels.

# Imports

In [1]:
import pandas as pd
from data_analysis_package import processing_data

# Loading / viewing relevant data

In [2]:
output_path = "D:\\user\\Documents\\GitHub\\Python_simple_data_analysis\\data_analysis_package\\data\\alkohol2024_clean.csv"
alcohol_path = "D:\\user\\Documents\\GitHub\\Python_simple_data_analysis\\data_analysis_package\\data\\alkohol2024.csv"
fire_path = "D:\\user\\Documents\\GitHub\\Python_simple_data_analysis\\data_analysis_package\\data\\pozary2024.csv"
population_path = "D:\\user\\Documents\\GitHub\\Python_simple_data_analysis\\data_analysis_package\\data\\powierzchnia_i_ludnosc2024.xlsx"
area_path = "D:\\user\\Documents\\GitHub\\Python_simple_data_analysis\\data_analysis_package\\data\\powierzchnia_geodezyjna2024.xlsx"

## Alcohol sellers data

In [3]:
alcohol_data = processing_data.process_alcohol_data(path_to_data = alcohol_path)
alcohol_data

Unnamed: 0,Voivodeship,Number of sellers
0,mazowieckie,124
1,wielkopolskie,53
2,małopolskie,42
3,śląskie,37
4,pomorskie,29
5,dolnośląskie,21
6,łódzkie,20
7,lubelskie,17
8,podlaskie,13
9,zachodniopomorskie,13


## Fire events data

In [4]:
fire_data = processing_data.process_fire_data(path_to_data = fire_path)
fire_data

Unnamed: 0,Voivodeship,Powiat,Number of events
0,dolnośląskie,Jelenia Góra,297
1,dolnośląskie,Legnica,500
2,dolnośląskie,Wałbrzych,316
3,dolnośląskie,Wrocław,1767
4,dolnośląskie,bolesławiecki,468
...,...,...,...
375,świętokrzyskie,sandomierski,343
376,świętokrzyskie,skarżyski,239
377,świętokrzyskie,starachowicki,224
378,świętokrzyskie,staszowski,256


## Population data

In [5]:
population_data = processing_data.process_population_data(path_to_data = population_path)
population_data

Unnamed: 0,Voivodeship,Powiat,Population
0,dolnośląskie,bolesławiecki,87642.0
1,dolnośląskie,dzierżoniowski,94740.0
2,dolnośląskie,głogowski,85043.0
3,dolnośląskie,górowski,32459.0
4,dolnośląskie,jaworski,47213.0
...,...,...,...
375,zachodniopomorskie,świdwiński,43237.0
376,zachodniopomorskie,wałecki,50003.0
377,zachodniopomorskie,Koszalin,105540.0
378,zachodniopomorskie,Szczecin,389066.0


## Area data

In [6]:
area_data = processing_data.process_area_data(path_to_data = area_path)
area_data

Unnamed: 0,Voivodeship,Powiat,Area (ha)
0,dolnośląskie,bolesławiecki,130349
1,dolnośląskie,dzierżoniowski,47851
2,dolnośląskie,głogowski,44326
3,dolnośląskie,górowski,73825
4,dolnośląskie,jaworski,58110
...,...,...,...
375,zachodniopomorskie,wałecki,141512
376,zachodniopomorskie,łobeski,106510
377,zachodniopomorskie,Koszalin,10557
378,zachodniopomorskie,Szczecin,30062


# Merging data sets by voivodeship and powiat

In [None]:
powiat_data_sets = [fire_data, population_data, area_data]
voivodeship_data_sets = [fire_data, population_data, area_data, alcohol_data]

powiat_data_sets_names = ['fire_data', 'population_data', 'area_data']
voivodeship_data_sets_names = ['fire_data', 'population_data', 'area_data', 'alcohol_data']

In [10]:
data_by_powiat = processing_data.merge_dataframes(list_of_dfs = powiat_data_sets, mode = 'Powiat')
data_by_powiat

Unnamed: 0,Voivodeship,Powiat,Number of events,Population,Area (ha)
0,dolnośląskie,Jelenia Góra,297,75124.0,10930
1,dolnośląskie,Legnica,500,91948.0,5629
2,dolnośląskie,Wałbrzych,316,100294.0,8468
3,dolnośląskie,Wrocław,1767,673743.0,29280
4,dolnośląskie,bolesławiecki,468,87642.0,130349
...,...,...,...,...,...
375,świętokrzyskie,sandomierski,343,71824.0,67588
376,świętokrzyskie,skarżyski,239,68103.0,39541
377,świętokrzyskie,starachowicki,224,83634.0,52341
378,świętokrzyskie,staszowski,256,68448.0,92502


In [11]:
data_by_voivodeship = processing_data.merge_dataframes(list_of_dfs = voivodeship_data_sets, mode = 'Voivodeship')
data_by_voivodeship

Unnamed: 0,Voivodeship,Number of events,Population,Area (ha),Number of sellers
0,dolnośląskie,10957,2879271.0,1994704,21
1,kujawsko-pomorskie,5223,1996003.0,1797155,11
2,lubelskie,5675,2011047.0,2512243,17
3,lubuskie,3476,975023.0,1398772,3
4,mazowieckie,17321,5510527.0,3555871,124
5,małopolskie,6849,3429632.0,1518359,42
6,opolskie,2641,936725.0,941160,5
7,podkarpackie,5078,2071676.0,1784537,12
8,podlaskie,3187,1138216.0,2018685,13
9,pomorskie,5467,2359573.0,1954681,29


# Calculating statistics

In [12]:
processing_data.calculate_basic_statistics(data_by_powiat)

Unnamed: 0,column,mean,median,std_dev,min,max
0,Number of events,272.734211,210.0,283.54811,50.0,4252.0
1,Population,99043.442105,73723.0,123778.852099,18645.0,1861599.0
2,Area (ha),82614.107895,77298.5,51884.839122,1330.0,297656.0


In [13]:
processing_data.calculate_basic_statistics(data_by_voivodeship)

Unnamed: 0,column,mean,median,std_dev,min,max
0,Number of events,6477.438,5447.5,3724.068,2641.0,17321.0
1,Population,2352282.0,2041361.5,1295666.0,936725.0,5510527.0
2,Area (ha),1962085.0,1888289.5,682937.8,941160.0,3555871.0
3,Number of sellers,25.875,15.0,29.79234,3.0,124.0


In [14]:
data_by_voivodeship.iloc[:,1:].corr()

Unnamed: 0,Number of events,Population,Area (ha),Number of sellers
Number of events,1.0,0.924425,0.574153,0.881658
Population,0.924425,1.0,0.498493,0.892643
Area (ha),0.574153,0.498493,1.0,0.675992
Number of sellers,0.881658,0.892643,0.675992,1.0


In [15]:
data_by_powiat.iloc[:,2:].corr()

Unnamed: 0,Number of events,Population,Area (ha)
Number of events,1.0,0.946744,0.033739
Population,0.946744,1.0,-0.077745
Area (ha),0.033739,-0.077745,1.0
