## **Read Json File**
- JSON, also known as JavaScript Object Notation, is a data-interchange text-serialization format. 

**JSON is mainly built on two structures:**

A collection of key/value pairs. In Python, a key/value pair is referred to as a Dictionary, and a key is a unique attribute, whereas values are not.

An ordered list of values. The ordered list can also sometimes be a list of lists. Lists in Python are a set of values which can be a string, integer, etc.

In [600]:
#Read a json dataset from website
import pandas as pd
json = pd.read_json('https://raw.githubusercontent.com/chrisalbon/simulated_datasets/master/data.json')
json.head()

Unnamed: 0,integer,datetime,category
0,5,2015-01-01 00:00:00,0
1,5,2015-01-01 00:00:01,0
2,9,2015-01-01 00:00:02,0
3,6,2015-01-01 00:00:03,0
4,6,2015-01-01 00:00:04,0


In [601]:
#Check the shape
json.shape

(100, 3)

In [602]:
#Save the online dataset to your drive
json.to_json('/content/gdrive/My Drive/Colab_Notebooks/Sample_data/dataframe.json')

In [603]:
#Open the dataset from drive
import pandas as pd

from google.colab import drive
drive.mount('/content/gdrive')

dataframe = pd.read_json('/content/gdrive/My Drive/Colab_Notebooks/Sample_data/dataframe.json')
dataframe.head()

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


Unnamed: 0,integer,datetime,category
0,5,2015-01-01 00:00:00,0
1,5,2015-01-01 00:00:01,0
2,9,2015-01-01 00:00:02,0
3,6,2015-01-01 00:00:03,0
4,6,2015-01-01 00:00:04,0


In [604]:
type(dataframe)

pandas.core.frame.DataFrame

**Create and read a json dataframe**

In [605]:
#Create and read a json dataframe
import pandas as pd

data = '{"employee_name":"Jessy", "email": "jessy@gmail.com", "Job_profile" : [{"title1" : "officer", "title2": "Director"}]}'
df1 = pd.read_json(data)
print(df1)

  employee_name            email                                  Job_profile
0         Jessy  jessy@gmail.com  {'title1': 'officer', 'title2': 'Director'}


**Json to different json format**

In [606]:
df1.to_json(orient="records")

'[{"employee_name":"Jessy","email":"jessy@gmail.com","Job_profile":{"title1":"officer","title2":"Director"}}]'

**read a json file from drive**
- To Avoid 'Mixing dicts with non-Series may lead to ambiguous ordering', we can read the file in this way also.

In [607]:
import json
import pandas as pd

from google.colab import drive
drive.mount('/content/gdrive')

data = open('/content/gdrive/My Drive/Colab_Notebooks/Sample_data/sample2.json')
df = json.load(data)
print(df)

# or,
# with open('/content/gdrive/My Drive/Colab_Notebooks/Sample_data/sample2.json') as data:    
#     df = json.load(data) 
# print(df)


Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).
{'firstName': 'Joe', 'lastName': 'Jackson', 'gender': 'male', 'age': 28, 'address': {'streetAddress': '101', 'city': 'San Diego', 'state': 'CA'}, 'phoneNumbers': [{'type': 'home', 'number': '7349282382'}]}


**Read a CSV file from website**

In [608]:
#read a csv file directly from website
df1 = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', header=None)
df1.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


## **Reading HTML Content**

### **Example : 01 (World Coin)**
- **By using requests method**

In [609]:
#Read a list of data from website
import pandas as pd
import requests

url = 'https://www.worldcoinindex.com/'
cryp_url = requests.get(url)
cryp_url

<Response [200]>

In [610]:
#Read a list of data from website and check len and type
cryp_data = pd.read_html(cryp_url.text)
len(cryp_data), type(cryp_data)

(1, list)

In [611]:
cryp_data = cryp_data[0]
# del cryp_fin['#']
# del cryp_fin['Unnamed: 1']
cryp_fin = cryp_data.iloc[:,2:]
cryp_fin.head()

Unnamed: 0,Name,Ticker,Last price,%,24 high,24 low,Price Charts 7d,24 volume,# Coins,Market cap
0,Ethereum,ETH,"$ 2,610.27",+2.81%,"$ 2,635.56","$ 2,524.38",,$ 8.89B,116.92M,$ 305.20B
1,Bitcoin,BTC,"$ 41,913",+0.82%,"$ 42,534","$ 41,322",,$ 8.79B,18.77M,$ 786.84B
2,Ripple,XRP,$ 0.767960,+2.75%,$ 0.775921,$ 0.743083,,$ 1.75B,46.31B,$ 35.56B
3,Polkadot,DOT,$ 18.50,+10.02%,$ 19.27,$ 16.64,,$ 1.24B,947.85M,$ 17.53B
4,Ethereumclassic,ETC,$ 52.93,+3.03%,$ 53.89,$ 50.97,,$ 1.12B,127.32M,$ 6.73B


**Dropping NaN**

In [612]:
del cryp_fin['Price Charts 7d'] #As in this column, all the values are NaN
cryp_fin.tail(10)

Unnamed: 0,Name,Ticker,Last price,%,24 high,24 low,24 volume,# Coins,Market cap
92,Status,SNT,$ 0.085514,+4.37%,$ 0.089355,$ 0.081355,$ 58.09M,3.47B,$ 296.77M
93,Iexec,RLC,$ 3.58,+7.75%,$ 3.77,$ 3.27,$ 57.14M,86.99M,$ 311.70M
94,Coin98,C98,$ 1.20,-6.78%,$ 1.29,$ 1.19,$ 54.11M,,
95,Kyber,KNC,$ 1.67,+1.63%,$ 1.74,$ 1.61,$ 53.83M,180.93M,$ 301.33M
96,Sentinelprotocol,UPP,$ 0.178773,+4.89%,$ 0.184290,$ 0.168010,$ 51.34M,,
97,Misbloc,MSB,$ 0.671108,-0.66%,$ 0.684032,$ 0.666619,$ 50.79M,,
98,Gas,GAS,$ 8.40,-2.60%,$ 8.63,$ 8.36,$ 50.23M,9.13M,$ 76.76M
99,Wootrade,WOO,$ 0.711118,-1.94%,$ 0.745606,$ 0.695313,$ 50.12M,,
100,Stratis,STRAX,$ 1.95,-1.37%,$ 1.99,$ 1.90,$ 48.99M,100.07M,$ 195.34M
101,Dai,DAI,$ 1.00,-0.07%,$ 1.01,$ 0.995359,$ 48.97M,,


In [613]:
#Now let's remove all the NaN's.
cryp_fin = cryp_fin.dropna()
cryp_fin.tail(10)

Unnamed: 0,Name,Ticker,Last price,%,24 high,24 low,24 volume,# Coins,Market cap
85,1inch,1INCH,$ 2.44,+2.43%,$ 2.48,$ 2.37,$ 69.82M,150.91M,$ 368.77M
86,Algorand,ALGO,$ 0.872167,+3.56%,$ 0.876278,$ 0.836507,$ 66.72M,430.28M,$ 375.28M
88,0x,ZRX,$ 0.841905,+2.43%,$ 0.843468,$ 0.816640,$ 60.80M,587.71M,$ 494.79M
89,Bitcoingold,BTG,$ 52.99,+2.05%,$ 54.83,$ 51.52,$ 59.10M,17.51M,$ 928.10M
90,Avalanche,AVAX,$ 13.85,+1.91%,$ 14.08,$ 13.43,$ 58.86M,128.01M,$ 1.77B
92,Status,SNT,$ 0.085514,+4.37%,$ 0.089355,$ 0.081355,$ 58.09M,3.47B,$ 296.77M
93,Iexec,RLC,$ 3.58,+7.75%,$ 3.77,$ 3.27,$ 57.14M,86.99M,$ 311.70M
95,Kyber,KNC,$ 1.67,+1.63%,$ 1.74,$ 1.61,$ 53.83M,180.93M,$ 301.33M
98,Gas,GAS,$ 8.40,-2.60%,$ 8.63,$ 8.36,$ 50.23M,9.13M,$ 76.76M
100,Stratis,STRAX,$ 1.95,-1.37%,$ 1.99,$ 1.90,$ 48.99M,100.07M,$ 195.34M


### **Example : 02** (Wikipedia Minnesota)

In [614]:
#Read a list of data from website
import pandas as pd

url1 = pd.read_html('https://en.wikipedia.org/wiki/Minnesota', match='Election results from statewide races')
print(url1)

[    Year     Office    GOP    DFL Others
0   2020  President  45.3%  52.4%   2.3%
1   2020    Senator  43.5%  48.8%   7.7%
2   2018   Governor  42.4%  53.9%   3.7%
3   2018    Senator  36.2%  60.3%   3.4%
4   2018    Senator  42.4%  53.0%   4.6%
5   2016  President  44.9%  46.4%   8.6%
6   2014   Governor  44.5%  50.1%   5.4%
7   2014    Senator  42.9%  53.2%   3.9%
8   2012  President  45.1%  52.8%   2.1%
9   2012    Senator  30.6%  65.3%   4.1%
10  2010   Governor  43.2%  43.7%  13.1%
11  2008  President  43.8%  54.1%   2.1%
12  2008    Senator  42.0%  42.0%  16.0%
13  2006   Governor  46.7%  45.7%   7.6%
14  2006    Senator  37.9%  58.1%   4.0%
15  2004  President  47.6%  51.1%   1.3%
16  2002   Governor  44.4%  33.5%  22.1%
17  2002    Senator  49.5%  47.3%   1.0%
18  2000  President  45.5%  47.9%   6.6%
19  2000    Senator  43.3%  48.8%   7.9%
20  1998   Governor  34.3%  28.1%  37.6%
21  1996  President  35.0%  51.1%  13.9%
22  1996    Senator  41.3%  50.3%   8.4%
23  1994   Gove

In [615]:
#Check the len and type
len(url1), type(url1)

(1, list)

In [616]:
#Convert to dataframe
df= url1[0]
df.head()

Unnamed: 0,Year,Office,GOP,DFL,Others
0,2020,President,45.3%,52.4%,2.3%
1,2020,Senator,43.5%,48.8%,7.7%
2,2018,Governor,42.4%,53.9%,3.7%
3,2018,Senator,36.2%,60.3%,3.4%
4,2018,Senator,42.4%,53.0%,4.6%


### **Example : 03 (Mobile Country)**

In [617]:
import pandas as pd
url2 = 'https://en.wikipedia.org/wiki/Mobile_country_code'
read = pd.read_html(url2, match='Country', header=0)
df = read[0]
df.head()

Unnamed: 0,Mobile country code,Country,ISO 3166,Mobile network codes,National MNC authority,Remarks
0,289,A Abkhazia,GE-AB,List of mobile network codes in Abkhazia,,MCC is not listed by ITU
1,412,Afghanistan,AF,List of mobile network codes in Afghanistan,,
2,276,Albania,AL,List of mobile network codes in Albania,,
3,603,Algeria,DZ,List of mobile network codes in Algeria,,
4,544,American Samoa (United States of America),AS,List of mobile network codes in American Samoa,,


In [618]:
len(df), type(df)

(252, pandas.core.frame.DataFrame)

## **Reading Excel Content**

In [619]:
import pandas as pd

from google.colab import drive
drive.mount('/content/gdrive')

excel_data = pd.read_excel('/content/gdrive/My Drive/Colab_Notebooks/Sample_data/FinalData.xlsx')
del excel_data['Unnamed: 0']
excel_data.head()

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


Unnamed: 0,Time_offset_ms,Type,ID_hex,Data_Length_Code,Data_bytes_hex,Data_bytes_hex_NoSpace
0,24208.4,Rx,1025,8,72339069014704128,72339069014704128
1,24216.7,Rx,1024,8,281474976710656,281474976710656
2,24225.7,Rx,1039,8,1081145385545694976,1081145385545694976
3,24232.1,Rx,1042,8,1297318167659414016,1297318167659414016
4,24288.5,Rx,272,8,0,0


In [620]:
type(excel_data)

pandas.core.frame.DataFrame

## **Reading Pickling (Pickle) Content**
- Pickle is a Python-specific binary serialization format which is not human-readable, unlike JSON. It is used for serializing and deserializing an object structure of Python. It serializes the object and Pickles it to save it on a disk. It converts the object like DataFrame, list, dictionary, etc. into a character stream.

The best part about Pickle is that it can store various kinds of Python data types.

Pickle is widely used for storing trained machine learning algorithm instances. Like JSON, Pickle also has handy functions like pickle.load() for loading a Pickle format file, and pickle.dump() for saving a Pickle or any other format in Pickle format.

*Another import advantage of using Pickle is that Saving the dataframe as a Pickle file required less space on the disk and keeps the type of the data intact when reloaded.*

- All pandas objects are equipped with to_pickle methods which use python's cPickle module to save data structures to disk using the pickle format.

In [623]:
#cryp_fin was created in the html sector
import pickle
with open('cryp_fin.pickle', 'wb') as sub_data:
  pickle.dump(cryp_fin, sub_data, protocol=pickle.HIGHEST_PROTOCOL)
cryp_fin = pd.read_pickle('cryp_fin.pickle')
cryp_fin.head()

Unnamed: 0,Name,Ticker,Last price,%,24 high,24 low,24 volume,# Coins,Market cap
0,Ethereum,ETH,"$ 2,610.27",+2.81%,"$ 2,635.56","$ 2,524.38",$ 8.89B,116.92M,$ 305.20B
1,Bitcoin,BTC,"$ 41,913",+0.82%,"$ 42,534","$ 41,322",$ 8.79B,18.77M,$ 786.84B
2,Ripple,XRP,$ 0.767960,+2.75%,$ 0.775921,$ 0.743083,$ 1.75B,46.31B,$ 35.56B
3,Polkadot,DOT,$ 18.50,+10.02%,$ 19.27,$ 16.64,$ 1.24B,947.85M,$ 17.53B
4,Ethereumclassic,ETC,$ 52.93,+3.03%,$ 53.89,$ 50.97,$ 1.12B,127.32M,$ 6.73B
