## Pandas and SQL - Continued

In [None]:
#Import pandas
import pandas as pd

### Reading in the NRI point  data
Below, we read the csv files into a Pandas dataFrame as we have in the past - with a few exceptions.

Pandas, like MS Access, will infer the data type from the values it's importing. However, we have some numeric fields that need to be imported as strings: the `recordid`, `fips`, `hydro`, `mhydro`, and `mlra` fields. To do this, we create a dictionary of field names and the field types we want to override. Any fields left of this list will get the default data types.

We will also set the recordid as the index for the dataFrame.

In [None]:
#Create the dataType dictionary
dtypeDict = {'recordid':'str',
             'fips':'str',
             'hydro':'str',
             'mhydro':'str',
             'mlra':'str'
            }

#Read in the data
dfPoint = pd.read_csv('../Data/nc_point.csv',
                      index_col='recordid',
                      dtype=dtypeDict)

In [None]:
#Show the data types
dfPoint.dtypes

In [None]:
#Have a quick look 
dfPoint.head()

Ok. Now it's your turn. Import the nc_trend.csv file. Set the following columns to be strings: `recordid`,`yr`,`landuse`,`broad`. (Others columns with nominal data should be strings, but this will suffice...). Also, as above, set the `recordid` column to be the index.

In [None]:
dtypeDict = {'recordid':'str',
             'yr':'str',
             'landuse':'str',
             'broad':'str'
            }

dfTrend = pd.read_csv("../Data/nc_trend.csv", dtype=dtypeDict, index_col='recordid')
dfTrend.dtypes

OK, now we are read to analyse the data (and learn how Pandas does it...)

* First another example of an aggregate function: Lets count the number of samples and total area of each location within each county using the `dfPoint` dataFrame.

In [None]:
#Create the grouping object
grpCounty = dfPoint.groupby('fips')
type(grpCounty)

In [None]:
#With this DataFrameGroupBy object we can apply different aggregate functions.
dfX = grpCounty['fips'].agg('count')
dfX.head()

In [None]:
#Sum up the xfact values and muliply by 10
dfX = grpCounty['xfact'].agg('sum')
dfX.head()

In [None]:
#Or we can combine the aggregating functions into a single 
# command using a dictionary to define how we want to aggregate

#Create a dictionary of field names: aggregating functions
grpFunctions = {'fips':['count'],'xfact':['sum']}

#Apply them all at once
dfX = grpCounty['xfact'].agg(grpFunctions)
dfX.head()

## Transforming data
Pandas can pivot data too. Let's pivot our `dfTrend` table so that it moves the year values into columns and presents the value in the `broad` column (for each year). This is done with the Pandas `pivot` function. The `columns` parameter is where we specify the column on which we want to pivot our data, and the `values` parameter is where we specify the column from which we take the values,  

In [None]:
dfX = dfTrend.pivot(columns='yr',values='broad')
dfX.head()

## Joining tables


In [None]:
#Build the broad codes dataFrame
dataDict = {'codes':['1','2','3','4','5','6','7','8','9','10','11','12'],
            'description':['Cropland_cultivated',
                      'Cropland_noncultivated',
                      'Pastureland',
                      'Rangeland',
                      'Forest land',
                      'Other rural land',
                      'Urban and built-up land',
                      'Rural transportation',
                      'Small water areas',
                      'Census water',
                      'Federal land',
                      'Conservation reserve program (CRP) land']}
dfBroadCodes = pd.DataFrame(dataDict,dtype='str')
dfBroadCodes

In [None]:
#Join to the dfX dataFrame
dfY = pd.merge(left=dfTrend,
               right=dfBroadCodes,
               how='outer',
               left_on='broad',
               right_on='codes')

In [None]:
dfY

In [None]:
#Re-pivot
dfTrend.pivot(columns='yr',values='broad')