## Pandas and JSON

#### 1. Import ```pandas```. 
> [Pandas](https://pandas.pydata.org/docs/) is an open source data analysis tool! It can **[read, write and manipulate](https://pandas.pydata.org/docs/reference/io.html)** most common forms of data (csv, json, excel etc).

In [None]:
import pandas as pd

#### 2. Read in the **JSON** data using Pandas

> Keep the [Pandas API Reference Guide](https://pandas.pydata.org/docs/reference/index.html#api) handy e.g. to read in JSON data via pandas, scroll down to the bottom of [this ```read_json``` reference](https://pandas.pydata.org/docs/reference/api/pandas.read_json.html#pandas.read_json) for example code.

In [None]:
df = pd.read_json('./../data/swimming_psb_data.json')
df

Unnamed: 0,c_Sport,c_Season,c_Event,c_Gender,n_DateSort,c_Person,c_PersonNatio,c_NOC,c_Result,n_ResultSort,c_Class
0,Swimming,2018,100m Backstroke,Men,20180809,Ryan Murphy,United States,United States,51.94,51940,Elite
1,Swimming,2018,100m Backstroke,Men,20180822,Xu Jiayu,China,China,52.30,52300,Elite
2,Swimming,2018,100m Backstroke,Men,20180806,Kliment Kolesnikov,Russia,Russia,52.51,52510,Elite
3,Swimming,2018,100m Backstroke,Men,20180819,Ryosuke Irie,Japan,Japan,52.53,52530,Elite
4,Swimming,2018,100m Backstroke,Men,20180728,Matt Grevers,United States,United States,52.55,52550,Elite
...,...,...,...,...,...,...,...,...,...,...,...
16943,Swimming,2018,800m Freestyle,Women,20180726,Klara Bosnjak,Croatia,Croatia,9:02.44,542440,Elite
16944,Swimming,2018,800m Freestyle,Women,20180519,Chantel Jeffrey,Canada,Canada,9:02.47,542470,Elite
16945,Swimming,2018,800m Freestyle,Women,20180421,Bindi Ware,Australia,Australia,9:02.48,542480,Elite
16946,Swimming,2018,800m Freestyle,Women,20180302,Ebony Blackstone,Australia,Australia,9:02.54,542540,Elite


#### 3. Filter Dataframe to contain athletes from 'Great Britain' only.
> e.g. https://stackoverflow.com/questions/17071871/how-to-select-rows-from-a-dataframe-based-on-column-values

In [None]:
new_gb_df = df.loc[df['c_NOC'] == 'Great Britain']
new_gb_df

Unnamed: 0,c_Sport,c_Season,c_Event,c_Gender,n_DateSort,c_Person,c_PersonNatio,c_NOC,c_Result,n_ResultSort,c_Class
39,Swimming,2018,100m Backstroke,Men,20180407,Chris Walker-Hebborn,England,Great Britain,54.23,54230,Elite
47,Swimming,2018,100m Backstroke,Men,20180406,Luke Greenbank,England,Great Britain,54.37,54370,Elite
60,Swimming,2018,100m Backstroke,Men,20180806,Nicholas Pyle,Great Britain,Great Britain,54.50,54500,Elite
62,Swimming,2018,100m Backstroke,Men,20180406,Xavier Castelli,Wales,Great Britain,54.60,54600,Elite
63,Swimming,2018,100m Backstroke,Men,20180805,Brodie Williams,Great Britain,Great Britain,54.60,54600,Elite
...,...,...,...,...,...,...,...,...,...,...,...
16873,Swimming,2018,800m Freestyle,Women,20180725,Freya Colbert,Great Britain,Great Britain,8:59.58,539580,Elite
16903,Swimming,2018,800m Freestyle,Women,20180302,Aisha Thornton,Great Britain,Great Britain,9:00.89,540890,Elite
16918,Swimming,2018,800m Freestyle,Women,20180323,Fleur Lewis,Great Britain,Great Britain,9:01.59,541590,Elite
16930,Swimming,2018,800m Freestyle,Women,20180507,Betsy Wizard,Great Britain,Great Britain,9:01.90,541900,Elite


#### 4. Write the filtered dataframe to a new file called ```gb_swimming_psb_data.json``` within the ```data``` folder.
> e.g. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html

In [None]:
new_gb_df.to_json('./../data/gb_psb_data.json', orient="records")

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=0895def6-8045-418b-b26b-022b302eebd8' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>