# Process files to unify data for town Wizna

Data concerning precipitation downloaded from IMGW website are divided into multiple files. Some files contains data from 5-year-periods, others contain data for one-year-periods. All files has data for many diffrent meteorological stations in Poland. The aim of this notebook is to extract data for Wizna town from each file and create one single DataFrame (and CSV file) keeping data since 1951 to 2019. 

All files have the same structure described below.

Data description:

- Meteorological station ID (PL: Kod stacji)                                              9
- Meteorological station name (PL: Nazwa stacji                                           30
- Year (PL: Rok)                                                                          4
- Month (PL: Miesiąc)                                                                     2
- Total monthly precipitation (PL: Miesięczna suma opadów [mm])                           8/1
- Measurement status SUMM (PL: Status pomiaru SUMM)                                             1
- Number of days with snow fall(PL: Liczba dni z opadem śniegu)                           5
- Measurement status LDS(PL: Status pomiaru LDS)                                          1
- Maximal precipitation (PL: Opad maksymalny [mm])                                        8/1
- Measurement status MAXO (PL: Status pomiaru MAXO)                                       1
- First day of maximal precipitation (PL: Dzień pierwszy wystąpienia opadu maksymalnego)  2
- Last day of maximal precipitation (PL: Dzień ostatni wystąpienia opadu maksymalnego)    2
- Number of days with snow cover(PL: Liczba dni z pokrywą śnieżną)                        5
- Measurement status LDPS (PL: Status pomiaru LDPS)                                       1

- Status "8" - missing measurement (PL: Status "8" brak pomiaru)
- Status "9" - lack of meteorological phenomenon(PL: Status "9" brak zjawiska)

In [1]:
import pandas as pd
import os

In [2]:
# Functions for elementary steps needed to be done in order to create 
# one Data Frame with precipitation data for one given station


def list_files_in_directory(directory):
    """(str) --> list_of_file
    Creates a list with relative paths to files containing data in given directory.
    >>>list_files_in_directory('Data')
    ['Data/o_m_1950_1955.csv',
     'Data/o_m_1956_1960.csv',
     'Data/o_m_1961_1965.csv',
     'Data/o_m_1966_1970.csv']
    """
    list_of_files=[]
    for file in os.listdir(directory):
        list_of_files.append(directory+'/'+file)
    list_of_files.sort()
    return list_of_files


def merge_data_for_given_station(list_of_files, column_names, station_name):
    """(list, list, string) --> DataFrame
    Merges data from all files form list_of_files filtering data to a given station_name. As files does not 
    contain headers, the list with column_names must be given as a parameter
    """
    data = pd.DataFrame(index=None, columns=column_names)
    for file in list_of_files:
        data_tmp = pd.read_csv(file, delimiter=',', skipinitialspace=True, encoding = 'latin2', header=None, names=column_names)
        data_tmp.drop(data_tmp.loc[data_tmp['station_name']!=station_name].index, inplace=True)
        data=data.append(data_tmp, ignore_index=True)
    return data



In [3]:
# Merging the data into one Data Frame
precip_column_names = ['station_ID', 'station_name', 'year', 'month', 'total_precip', 'SUMN_status', 'n_snow_fall', 'LDS_status', 'max_precip', 'MAXO_status', 'first_day_max_precip', 'last_day_max_precip', 'n_snow_cover', 'LDPS_status']
list_of_precip_files = list_files_in_directory('Data')
precip_data_WIZNA = merge_data_for_given_station(list_of_precip_files, precip_column_names, 'WIZNA')

ParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.

In [4]:
# Store DataFrame with merged data from all years for one station in CSV file
precip_data_WIZNA.to_csv(r'Data/Merged_Data/precipitat_data_WIZNA_1951_2019.csv', index=False)

NameError: name 'precip_data_WIZNA' is not defined

In [5]:
# show the final Data Frame
precip_data_WIZNA

NameError: name 'precip_data_WIZNA' is not defined

### Data for town Wizna have been stored in one single CSV file. To proceed with data exploration please open a file: 03_Explore_precipitation_data.ipynb