# Individual Dataframes Notebook

In this notebook, I will be transforming the corn data frame created in the `Corn Data Prep and Filtering` notebook into two new data frames. These new data frames will contain national corn data broken down by year and by month, respectively.

In [1]:
# import needed tools.
import pandas as pd
import numpy as np
import math

I will now import the climate and compiled corn data frames. I will also import a csv file containing the US population as a function of year. The US Population data was taken from https://www.multpl.com/united-states-population/table/by-year.

In [2]:
# corn data
corn_df_compiled = pd.read_csv('./compiled_corn_df.csv', index_col=0)

# climate data
climate_df = pd.read_csv('./climate.csv')

# USA population data
USPop = pd.read_csv('./USPop.csv')

# confirm proper import
display(climate_df.head(), corn_df_compiled.head(), USPop.head())

Unnamed: 0,Year,Cooling Degree Days,Heating Degree Days,Precipitation,Palmer Drought Severity Index (PDSI),Palmer Hydrological Drought Index (PHDI),Palmer Modified Drought Index (PMDI),Average Temperature,Maximum Temperature,Minimum Temperature,Palmer Z-Index
0,1950,263,14,30.87,1.93,1.93,1.93,51.39,63.61,39.17,4.14
1,1951,314,9,31.25,1.65,1.65,1.65,51.12,63.19,39.04,0.8
2,1952,355,6,26.34,-1.84,-1.84,-1.84,52.27,64.7,39.85,-2.32
3,1953,330,7,28.31,-1.76,-1.76,-0.88,53.37,65.76,40.96,-0.43
4,1954,349,10,26.15,-4.33,-4.33,-4.33,53.33,65.78,40.87,-2.76


Unnamed: 0,Year,Period,Data Item,Value
0,1950,YEAR,ACRES HARVESTED,72398000
1,1951,YEAR,ACRES HARVESTED,71191000
2,1952,YEAR,ACRES HARVESTED,71353000
3,1953,YEAR,ACRES HARVESTED,70738000
4,1954,YEAR,ACRES HARVESTED,68668000


Unnamed: 0,Year,USPop
0,1950,152270000
1,1951,154880000
2,1952,157550000
3,1953,160180000
4,1954,163030000


All data appears to have been imported successfully.

## National Year

I will now make a new data frame for all of the data collected on an annual basis. In this new table, I will want columns associated with all of the unique data items in the corn data frame.

In [3]:
# get unique values from the corn data frame
print(corn_df_compiled['Data Item'].value_counts().keys())

Index(['PRICE RECEIVED, MEASURED IN $ / BU', 'PRODUCTION, MEASURED IN BU',
       'ACRES HARVESTED', 'YIELD, MEASURED IN BU / ACRE',
       'PRODUCTION, MEASURED IN $'],
      dtype='object')


The unique data items are:
- `PRICE RECEIVED, MEASURED IN $ / BU`
- `PRODUCTION, MEASURED IN BU`
- `ACRES HARVESTED`
- `YIELD, MEASURED IN BU / ACRE`
- `PRODUCTION, MEASURED IN $`

In [4]:
# create a new data frame with year and the unique data items as column headers
national_year = pd.DataFrame(columns=[
    'Year', 
    'PRICE RECEIVED, MEASURED IN $ / BU', 
    'ACRES HARVESTED', 
    'PRODUCTION, MEASURED IN BU', 
    'YIELD, MEASURED IN BU / ACRE', 
    'PRODUCTION, MEASURED IN $'
])

# I will now populate this table with the proper years and 0's for all other entries to make sure the data frame has the proper shape.
# I want the table to conatin data for years 1950 through 2020
for year in range(1950,2021):
    # For the given year, set the year and fill all other values in the row with 0's
    national_year.loc[len(national_year.index)] = [int(year), 0, 0, 0, 0, 0] 

# confirm the data frame was created successfully
display(national_year)

Unnamed: 0,Year,"PRICE RECEIVED, MEASURED IN $ / BU",ACRES HARVESTED,"PRODUCTION, MEASURED IN BU","YIELD, MEASURED IN BU / ACRE","PRODUCTION, MEASURED IN $"
0,1950,0,0,0,0,0
1,1951,0,0,0,0,0
2,1952,0,0,0,0,0
3,1953,0,0,0,0,0
4,1954,0,0,0,0,0
...,...,...,...,...,...,...
66,2016,0,0,0,0,0
67,2017,0,0,0,0,0
68,2018,0,0,0,0,0
69,2019,0,0,0,0,0


The data frame appears to have been created successfully. I will now populate the table with the proper data. Each row in the `corn_df_compiled` data frame contains a unique data item. I will thus scan through each line of the `corn_df_compiled` data frame, check which data item is there and for which year (and if it is annual data) and put it in the appropriate cell in the `national_year` data frame.

In [5]:
# iterate through each row of the corn_df_compiled data frame
for i in corn_df_compiled.index:
    
    # check to see if the data is annual or not. If it is annual, proceed. Otherwise, ignore the data and move on.
    if (corn_df_compiled['Period'][i] == 'YEAR') and corn_df_compiled['Year'][i] < 2021:
        
        # If the data is annual data, determine which year it was collected for
        year = corn_df_compiled['Year'][i]
        
        # find the row in the national_year data frame where the data should go 
        ind = national_year[national_year["Year"] == year].index[0]
        
        # add the data to the national_year data frame in the column with the same label as the data item entry and the same year 
        national_year[corn_df_compiled['Data Item'][i]][ind] = corn_df_compiled['Value'][i]
        
# confirm the data frame was populated correctly
national_year.head()

Unnamed: 0,Year,"PRICE RECEIVED, MEASURED IN $ / BU",ACRES HARVESTED,"PRODUCTION, MEASURED IN BU","YIELD, MEASURED IN BU / ACRE","PRODUCTION, MEASURED IN $"
0,1950,1.52,72398000,2764071000,38.2,4222366000
1,1951,1.66,71191000,2628937000,36.9,4364659000
2,1952,1.52,71353000,2980793000,41.8,4557031000
3,1953,1.48,70738000,2881801000,40.7,4291366000
4,1954,1.43,68668000,2707913000,39.4,3872433000


## Add Weather

I will now add the climate data to the `national_year` data frame. I first want to make sure that they have the same number of rows.

In [6]:
# display the lenghts of both data frames
display(national_year.shape[0], climate_df.shape[0])

71

72

The `national_year` data frame is one row longer. I will display both data frames to determine why this is.

In [7]:
display(national_year)
display(climate_df)

Unnamed: 0,Year,"PRICE RECEIVED, MEASURED IN $ / BU",ACRES HARVESTED,"PRODUCTION, MEASURED IN BU","YIELD, MEASURED IN BU / ACRE","PRODUCTION, MEASURED IN $"
0,1950,1.52,72398000,2764071000,38.2,4222366000
1,1951,1.66,71191000,2628937000,36.9,4364659000
2,1952,1.52,71353000,2980793000,41.8,4557031000
3,1953,1.48,70738000,2881801000,40.7,4291366000
4,1954,1.43,68668000,2707913000,39.4,3872433000
...,...,...,...,...,...,...
66,2016,3.48,86748000,15148038000,174.6,51304297000
67,2017,3.36,82733000,14609407000,176.6,49567854000
68,2018,3.47,81276000,14340369000,176.4,52102404000
69,2019,3.75,81337000,13619928000,167.5,48940622000


Unnamed: 0,Year,Cooling Degree Days,Heating Degree Days,Precipitation,Palmer Drought Severity Index (PDSI),Palmer Hydrological Drought Index (PHDI),Palmer Modified Drought Index (PMDI),Average Temperature,Maximum Temperature,Minimum Temperature,Palmer Z-Index
0,1950,263,14,30.87,1.93,1.93,1.93,51.39,63.61,39.17,4.14
1,1951,314,9,31.25,1.65,1.65,1.65,51.12,63.19,39.04,0.80
2,1952,355,6,26.34,-1.84,-1.84,-1.84,52.27,64.70,39.85,-2.32
3,1953,330,7,28.31,-1.76,-1.76,-0.88,53.37,65.76,40.96,-0.43
4,1954,349,10,26.15,-4.33,-4.33,-4.33,53.33,65.78,40.87,-2.76
...,...,...,...,...,...,...,...,...,...,...,...
67,2017,359,5,32.31,0.89,0.89,0.85,54.55,66.35,42.74,0.15
68,2018,372,3,34.65,-1.80,-1.80,-1.80,53.52,65.09,41.94,-0.99
69,2019,370,5,34.82,5.22,5.22,5.22,52.68,64.08,41.27,0.93
70,2020,394,4,30.36,-0.32,2.45,1.85,54.38,66.33,42.41,-0.15


The `national_year` data frame has data for the year 2021. This should not be there. I will drop this row from the data frame.

In [8]:
# drop the last row of the data frame
climate_df.drop(climate_df.tail(1).index,inplace=True)

# confirm that the last row is now for the year 2020
display(climate_df.tail(1))

Unnamed: 0,Year,Cooling Degree Days,Heating Degree Days,Precipitation,Palmer Drought Severity Index (PDSI),Palmer Hydrological Drought Index (PHDI),Palmer Modified Drought Index (PMDI),Average Temperature,Maximum Temperature,Minimum Temperature,Palmer Z-Index
70,2020,394,4,30.36,-0.32,2.45,1.85,54.38,66.33,42.41,-0.15


The last row now corresponds to the year 2020. I should be good to merge the two data frames now.

In [9]:
# set the national_year data frame equal to the merged data frames
national_year = pd.concat([national_year, climate_df.drop(columns='Year')], axis=1)

#confirm that the merge was successful
display(national_year)

Unnamed: 0,Year,"PRICE RECEIVED, MEASURED IN $ / BU",ACRES HARVESTED,"PRODUCTION, MEASURED IN BU","YIELD, MEASURED IN BU / ACRE","PRODUCTION, MEASURED IN $",Cooling Degree Days,Heating Degree Days,Precipitation,Palmer Drought Severity Index (PDSI),Palmer Hydrological Drought Index (PHDI),Palmer Modified Drought Index (PMDI),Average Temperature,Maximum Temperature,Minimum Temperature,Palmer Z-Index
0,1950,1.52,72398000,2764071000,38.2,4222366000,263,14,30.87,1.93,1.93,1.93,51.39,63.61,39.17,4.14
1,1951,1.66,71191000,2628937000,36.9,4364659000,314,9,31.25,1.65,1.65,1.65,51.12,63.19,39.04,0.80
2,1952,1.52,71353000,2980793000,41.8,4557031000,355,6,26.34,-1.84,-1.84,-1.84,52.27,64.70,39.85,-2.32
3,1953,1.48,70738000,2881801000,40.7,4291366000,330,7,28.31,-1.76,-1.76,-0.88,53.37,65.76,40.96,-0.43
4,1954,1.43,68668000,2707913000,39.4,3872433000,349,10,26.15,-4.33,-4.33,-4.33,53.33,65.78,40.87,-2.76
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
66,2016,3.48,86748000,15148038000,174.6,51304297000,381,5,31.42,-0.82,1.06,-0.28,54.92,66.69,43.13,-0.58
67,2017,3.36,82733000,14609407000,176.6,49567854000,359,5,32.31,0.89,0.89,0.85,54.55,66.35,42.74,0.15
68,2018,3.47,81276000,14340369000,176.4,52102404000,372,3,34.65,-1.80,-1.80,-1.80,53.52,65.09,41.94,-0.99
69,2019,3.75,81337000,13619928000,167.5,48940622000,370,5,34.82,5.22,5.22,5.22,52.68,64.08,41.27,0.93


The data frames appear to have merged successfully. I will now add the population data to this same `national_year` data frame.

In [10]:
# set the national_year data frame equal to itself merged with the USPop data frame
national_year = pd.concat([national_year, USPop.drop(columns='Year')], axis = 1)

# confirm the merge was successful
display(national_year)

Unnamed: 0,Year,"PRICE RECEIVED, MEASURED IN $ / BU",ACRES HARVESTED,"PRODUCTION, MEASURED IN BU","YIELD, MEASURED IN BU / ACRE","PRODUCTION, MEASURED IN $",Cooling Degree Days,Heating Degree Days,Precipitation,Palmer Drought Severity Index (PDSI),Palmer Hydrological Drought Index (PHDI),Palmer Modified Drought Index (PMDI),Average Temperature,Maximum Temperature,Minimum Temperature,Palmer Z-Index,USPop
0,1950,1.52,72398000,2764071000,38.2,4222366000,263,14,30.87,1.93,1.93,1.93,51.39,63.61,39.17,4.14,152270000
1,1951,1.66,71191000,2628937000,36.9,4364659000,314,9,31.25,1.65,1.65,1.65,51.12,63.19,39.04,0.80,154880000
2,1952,1.52,71353000,2980793000,41.8,4557031000,355,6,26.34,-1.84,-1.84,-1.84,52.27,64.70,39.85,-2.32,157550000
3,1953,1.48,70738000,2881801000,40.7,4291366000,330,7,28.31,-1.76,-1.76,-0.88,53.37,65.76,40.96,-0.43,160180000
4,1954,1.43,68668000,2707913000,39.4,3872433000,349,10,26.15,-4.33,-4.33,-4.33,53.33,65.78,40.87,-2.76,163030000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
66,2016,3.48,86748000,15148038000,174.6,51304297000,381,5,31.42,-0.82,1.06,-0.28,54.92,66.69,43.13,-0.58,322940000
67,2017,3.36,82733000,14609407000,176.6,49567854000,359,5,32.31,0.89,0.89,0.85,54.55,66.35,42.74,0.15,324990000
68,2018,3.47,81276000,14340369000,176.4,52102404000,372,3,34.65,-1.80,-1.80,-1.80,53.52,65.09,41.94,-0.99,326690000
69,2019,3.75,81337000,13619928000,167.5,48940622000,370,5,34.82,5.22,5.22,5.22,52.68,64.08,41.27,0.93,328240000


`national_year` is now complete and ready for export. I will export it to the `Data Modelling` folder for modelling.

In [11]:
# export the data frame as a csv
national_year.to_csv('../Data Modelling/nat_year.csv')

## National Month

I will now make a new data frame for all of the data collected on an annual basis. In this new table, I will want columns associated with all of the unique data items in the corn data frame.

In [12]:
corn_df_compiled[corn_df_compiled['Period'] != "YEAR"]['Data Item'].value_counts()

PRICE RECEIVED, MEASURED IN $ / BU    858
Name: Data Item, dtype: int64

It appears that only the price recieved per bushel was collected on a monthly basis. I will thus create a new data frame with this data as one column, and the year and month number as two other columns.

In [13]:
# Create the new data frame
national_month = pd.DataFrame(columns=['Year', 'Month', 'Month Number', 'PRICE RECEIVED, MEASURED IN $ / BU'])

# Create a list of Month by name and order
months = [('JAN', 1), ('FEB', 2), ('MAR', 3), ('APR', 4), ('MAY', 5), ('JUN', 6), ('JUL', 7), ('AUG', 8), ('SEP', 9), ('OCT', 10), ('NOV', 11), ('DEC', 12)]

# iterate through each year from 1950 to 2020
for year in range(1950,2021):
    
    # iterate through the named month list
    for month in months:
        
        # Find the current length of the table and add a new row.
        # Because the data frame is zero indexed, the index of the row to add will always be equal to the current length of the data frame.
        # Add the year, the name and order number of the months, and a 0 for the price recieved column.
        national_month.loc[len(national_month.index)] = [int(year), month[0], month[1], 0] 

# check that the data frame was created successfully.
display(national_month.head(20))

Unnamed: 0,Year,Month,Month Number,"PRICE RECEIVED, MEASURED IN $ / BU"
0,1950,JAN,1,0
1,1950,FEB,2,0
2,1950,MAR,3,0
3,1950,APR,4,0
4,1950,MAY,5,0
5,1950,JUN,6,0
6,1950,JUL,7,0
7,1950,AUG,8,0
8,1950,SEP,9,0
9,1950,OCT,10,0


The data frame has been created successfully. I will now fill in the price recieved data.

In [14]:
# iterate through each row of the corn_df_compiled data frame
for i in corn_df_compiled.index:
    
    # check to see if the data is annual or not. If it is annual, ignore it and move on. 
    # If it is not annual, proceed to add it to the data frame.
    if (corn_df_compiled['Period'][i] != 'YEAR') and corn_df_compiled['Year'][i] < 2021:
        
        # determine which year and month the data were collected for
        year = corn_df_compiled['Year'][i]
        month = corn_df_compiled['Period'][i]
        
        # Filter out all of the data that does not correspond to the correct year and month.
        # This will leave only a single row.
        year_filter = (national_month["Year"] == year)
        month_filter = (national_month["Month"] == month)
        
        # Apply the filters and determine the index of the remaining row.
        ind = national_month[year_filter & month_filter].index[0]
        
        # Put the data in the appropriate row and column.
        national_month[corn_df_compiled['Data Item'][i]][ind] = corn_df_compiled['Value'][i]

# confirm the data frame was populated successfully
display(national_month.head(5))

Unnamed: 0,Year,Month,Month Number,"PRICE RECEIVED, MEASURED IN $ / BU"
0,1950,JAN,1,1.15
1,1950,FEB,2,1.16
2,1950,MAR,3,1.19
3,1950,APR,4,1.26
4,1950,MAY,5,1.34


The data frame was populated successfully. Let's check that all of the data re in the correct format.

In [15]:
# show formats of all columns
national_month.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 852 entries, 0 to 851
Data columns (total 4 columns):
 #   Column                              Non-Null Count  Dtype 
---  ------                              --------------  ----- 
 0   Year                                852 non-null    object
 1   Month                               852 non-null    object
 2   Month Number                        852 non-null    object
 3   PRICE RECEIVED, MEASURED IN $ / BU  852 non-null    object
dtypes: object(4)
memory usage: 65.6+ KB


The `Year` and `Month Number` columns should both be int value types. 

In [16]:
# Cast the columns as the appropriate data type
national_month['Year'] = national_month['Year'].astype('int')
national_month['Month Number'] = national_month['Month Number'].astype('int')

# check that the types changed
national_month.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 852 entries, 0 to 851
Data columns (total 4 columns):
 #   Column                              Non-Null Count  Dtype 
---  ------                              --------------  ----- 
 0   Year                                852 non-null    int32 
 1   Month                               852 non-null    object
 2   Month Number                        852 non-null    int32 
 3   PRICE RECEIVED, MEASURED IN $ / BU  852 non-null    object
dtypes: int32(2), object(2)
memory usage: 58.9+ KB


All of the columns now appear to be of the correct type.

### Add Weather

Climate data was only collected on an anual basis. Therefore, I will have to model it into a monthly schema. For this excersize, I will only be modelling the percipitaion and temperature. I will first create a data frame for the manipulated climate information, and then I will proceed to populate the data frame.

In [17]:
# Create the new data frame.
clim_manip = pd.DataFrame(columns=['Year', 'Month', 'Precipitation', 'Average Temperature'])

# Add year and month data to the data frame
for year in range(1950,2021):
    for month in months:
        clim_manip.loc[len(clim_manip.index)] = [int(year), month[0], 0, 0] 

# confirm the data frame was created.
clim_manip.head()

Unnamed: 0,Year,Month,Precipitation,Average Temperature
0,1950,JAN,0,0
1,1950,FEB,0,0
2,1950,MAR,0,0
3,1950,APR,0,0
4,1950,MAY,0,0


The data frame was created successfully. I will now populate it with the proper data.

In [18]:
# iterate through each row of the clim_manip data frame
for i in clim_manip.index:
    
    # find the year of the row. y is short for year
    y = clim_manip["Year"][i]
    
    # index where the year y is found in the climate_df
    clim_ind = climate_df.loc[climate_df["Year"] == y].index[0]
    
    # mon is short for month. mon can be calculated from the index
    mon = (i + 1) % 12
    
    # If the month is 12, (i + 1) modulo 12 will be 0.
    # So, if the value is 0, change it to 12.
    if mon == 0:
        mon = 12
    
    # precip is short for precipitation. 
    # I will pull the precipitaion value out of the original climate data frame.
    total_precip = climate_df[' Precipitation'][clim_ind]
    
    # I will then distributes the rain over the various months using a cosine function.
    # Thus, the total rain for the year will be the sum of the rain for the individual months. 
    # In this way, the most rain will fall in december, and no rain will fall in June.
    clim_manip['Precipitation'][i] = (total_precip/12) * math.cos(2*math.pi/12*mon)  + (total_precip/12)
    
    
    # To calculate the temperature by month, I will be finding the max and min temperatures for the year.
    # I will then distribute temperature throughtout the year, again using a cosine function.
    # The max temperature will occure in July, the typically hottest month of the year.
    # The min temperature will occur in January, the typically coldest monthe of the year.
    
    # temp is short for temperature.
    year_max_temp = climate_df[' Maximum Temperature'][clim_ind]
    year_min_temp = climate_df[' Minimum Temperature'][clim_ind]
    
    # calculates monthly temps from yearly max and min
    clim_manip['Average Temperature'][i] = (year_max_temp - year_min_temp)/2 * math.cos(2*math.pi/12*(mon+5)) + (year_max_temp + year_min_temp)/2 
    
# confirm that the data frame was populated correctly.
clim_manip.head(20)

Unnamed: 0,Year,Month,Precipitation,Average Temperature
0,1950,JAN,4.80035,39.17
1,1950,FEB,3.85875,40.80717
2,1950,MAR,2.5725,45.28
3,1950,APR,1.28625,51.39
4,1950,MAY,0.34465,57.5
5,1950,JUN,0.0,61.97283
6,1950,JUL,0.34465,63.61
7,1950,AUG,1.28625,61.97283
8,1950,SEP,2.5725,57.5
9,1950,OCT,3.85875,51.39


The data frame appears to have been populated successfully. I will now merge it with the `national_month` data frame. I will first ensure that they are both the same lenght. 

In [19]:
# Print the lengths of the data frames
print(clim_manip.shape[0], national_month.shape[0])

852 852


The two data frames are the same length. I will now merge them.

In [20]:
# set the national_month data frame equal to the merged data frames
national_month = pd.concat([national_month, clim_manip.drop(columns=['Year', 'Month'])], axis=1)

#confirm that the merge was successful
display(national_month)

Unnamed: 0,Year,Month,Month Number,"PRICE RECEIVED, MEASURED IN $ / BU",Precipitation,Average Temperature
0,1950,JAN,1,1.15,4.80035,39.17
1,1950,FEB,2,1.16,3.85875,40.80717
2,1950,MAR,3,1.19,2.5725,45.28
3,1950,APR,4,1.26,1.28625,51.39
4,1950,MAY,5,1.34,0.34465,57.5
...,...,...,...,...,...,...
847,2020,AUG,8,3.12,1.265,64.727664
848,2020,SEP,9,3.4,2.53,60.35
849,2020,OCT,10,3.61,3.795,54.37
850,2020,NOV,11,3.79,4.721044,48.39


Now, I want to map the `Month` column onto a circle to catch the cyclic nature of months. To do this, I will map the `Month Number` column onto two new columns associated with the sine and cosine of the month's numerical value.

In [21]:
# Create base angle for months
month_base_angle = 2 * math.pi / 12

# Create cosine and sine columns for the data 
month_cos = np.cos(national_month['Month Number'] * month_base_angle)
month_sin = np.sin(national_month['Month Number'] * month_base_angle)

# Add the sine and cosine columns to the categorical data frame
national_month['month_cos'] = month_cos
national_month['month_sin'] = month_sin

# Ensure that the columns were created properly
display(national_month.head(5))

Unnamed: 0,Year,Month,Month Number,"PRICE RECEIVED, MEASURED IN $ / BU",Precipitation,Average Temperature,month_cos,month_sin
0,1950,JAN,1,1.15,4.80035,39.17,0.8660254,0.5
1,1950,FEB,2,1.16,3.85875,40.80717,0.5,0.866025
2,1950,MAR,3,1.19,2.5725,45.28,6.123234000000001e-17,1.0
3,1950,APR,4,1.26,1.28625,51.39,-0.5,0.866025
4,1950,MAY,5,1.34,0.34465,57.5,-0.8660254,0.5


Month was mapped to sine and cosine columns correctly. I will now drop the `Month` column from the data frame so that it only contains numerical data.

In [22]:
# drop the column
national_month.drop(columns='Month', inplace=True)

# confirm the column dropped successfully
national_month.head()

Unnamed: 0,Year,Month Number,"PRICE RECEIVED, MEASURED IN $ / BU",Precipitation,Average Temperature,month_cos,month_sin
0,1950,1,1.15,4.80035,39.17,0.8660254,0.5
1,1950,2,1.16,3.85875,40.80717,0.5,0.866025
2,1950,3,1.19,2.5725,45.28,6.123234000000001e-17,1.0
3,1950,4,1.26,1.28625,51.39,-0.5,0.866025
4,1950,5,1.34,0.34465,57.5,-0.8660254,0.5


The `national_month` data frame is now ready for export. I will export it to the `Data Modelling` folder for modelling.

In [23]:
# export the data frame as a csv
national_month.to_csv('../Data Modelling/nat_month.csv')

This is the end of this notebook.