## Reading a JSON file from a Website

Let's get some data to start working with.  I care about the planet, so Global CO2 Emissions seem like a good place to start.  
We're going to use the Package module from datapackage

#### Import Libraries

In [18]:
import numpy as np
import pandas as pd
from datapackage import Package
from datetime import datetime as dt

#### Check out the files available

In [6]:
package = Package('https://datahub.io/core/co2-fossil-global/datapackage.json')
print(package.resource_names)

['validation_report', 'global_csv', 'global_json', 'co2-fossil-global_zip', 'global']


#### Read the second resource, 'global_csv' into a DataFrame

In [34]:
columns=['Year', 'Total', 'Gas Fuel', 'Liquid Fuel', 'Solid Fuel', 'Cement', 'Gas Flaring', 'Per Capita']
df = pd.DataFrame(package.resources[1].read(), columns=columns)
df.head()

Unnamed: 0,Year,Total,Gas Fuel,Liquid Fuel,Solid Fuel,Cement,Gas Flaring,Per Capita
0,1751,3,0,0,3,0,0,
1,1752,3,0,0,3,0,0,
2,1753,3,0,0,3,0,0,
3,1754,3,0,0,3,0,0,
4,1755,3,0,0,3,0,0,


#### Check out some stuff

In [35]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 260 entries, 0 to 259
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Year         260 non-null    int64 
 1   Total        260 non-null    object
 2   Gas Fuel     260 non-null    object
 3   Liquid Fuel  260 non-null    object
 4   Solid Fuel   260 non-null    object
 5   Cement       260 non-null    object
 6   Gas Flaring  260 non-null    object
 7   Per Capita   61 non-null     object
dtypes: int64(1), object(7)
memory usage: 16.4+ KB


Looks like we want to change some of the dtypes! We know that these columns are integers.  Except, what's up with Per Capita? 

In [41]:
df = df.astype({'Total':'int64', 'Gas Fuel':'int64', 'Liquid Fuel':'int64', 'Solid Fuel':'int64', 'Cement':'int64', 'Gas Flaring':'int64'})

df['Per Capita'].value_counts(dropna=False)

NaN     199
1.12      6
1.11      5
1.13      4
1.16      3
0.69      3
1.14      3
1.17      3
1.18      2
1.19      2
1.28      2
1.10      2
0.80      1
1.15      1
0.97      1
1.09      1
1.30      1
1.21      1
0.84      1
1.24      1
0.88      1
0.68      1
1.20      1
0.79      1
0.74      1
1.27      1
0.85      1
1.05      1
1.01      1
1.23      1
0.86      1
0.64      1
0.92      1
0.83      1
0.98      1
1.33      1
0.77      1
0.94      1
Name: Per Capita, dtype: int64

Looks like we can change Per Capita to a float! 

In [42]:
df = df.astype({'Per Capita':'float64'})

#### We also want the ratio of Liquid to Solid fuel for an arbitrary reason
loc is the preferred method for updating df values & creating new columns

In [43]:
df.loc[:,'L/S_Ratio'] = df['Liquid Fuel'] / df['Solid Fuel']
df.head()

Unnamed: 0,Year,Total,Gas Fuel,Liquid Fuel,Solid Fuel,Cement,Gas Flaring,Per Capita,L/S_Ratio
0,1751,3,0,0,3,0,0,,0.0
1,1752,3,0,0,3,0,0,,0.0
2,1753,3,0,0,3,0,0,,0.0
3,1754,3,0,0,3,0,0,,0.0
4,1755,3,0,0,3,0,0,,0.0


Notice how it makes a float, even though it's made up of two integers. Pandas are smart. 

Alright looks good for now..

#### Save as a CSV for later use

In [44]:
%time df.to_csv('co2.csv', index=False)
print('Completed: %s' % dt.now().strftime('%m/%d %H:%M\n'))

Wall time: 3 ms
Completed: 03/09 14:28

