#### Reading Data from Different Resources
- lets learn how do we read data from different resources using available in-built function in pandas

In [4]:
import pandas as pd
from io import StringIO
data = '{"employee_name": "James","email": "james@gmail.com", "job_profile":[{"title1":"Team Lead","title2":"Sr.Developer"}]}'

# we have json object in the data..lets learn how to load this json into pandas and create a data frame.

# if we have a json data,by this way we can able to convert it into a dataframe. 
df = pd.read_json(StringIO(data))
df.to_json(orient="index") 
# As soon as we execute this line of code, the given dataframe convert the dataframe into the json string object.
# But it looks different than source for the dataframe creation.
df.to_json(orient='records')  

'[{"employee_name":"James","email":"james@gmail.com","job_profile":{"title1":"Team Lead","title2":"Sr.Developer"}}]'

In [None]:
!pip install lxml

In [9]:
# we have a data in the url html page as a comma separated file. lets load that now
url = "https://gist.githubusercontent.com/tijptjik/9408623/raw/b237fa5848349a14a14e5d4107dc7897c21951f5/wine.csv"
df = pd.read_csv(url, header=None)
df.to_csv("./data_set/wine.csv")

In [None]:
# lets say we have html which have a table data that can be scrapped by the pandas function. 
url = "https://www.fdic.gov/bank-failures/failed-bank-list?combine=&items_per_page=All"
df = pd.read_html(url, match="Acquiring Institution")
df[0].isnull().sum()

In [None]:
# lets try with some wikipedia site where we find a table and scrap it up. 
url="https://en.wikipedia.org/wiki/List_of_states_and_union_territories_of_India_by_population"
df = pd.read_html(url, match="Rank",header=None)
df[0].isnull().sum()

In [None]:
url="https://en.wikipedia.org/wiki/Mobile_country_code"
df = pd.read_html(url, match="Country", header=None)
df[0]

# This is how we can able to scrap the entire table in the html. 

In [None]:
# reading the xlsx file and create a dataframe
df = pd.read_excel("data_set/excel_file.xlsx")
display(df)
# This way we can read the xlsx file easily by the pandas function.


#### Pickle File :
  - A pickle file in Python is a file format used for serializing and deserializing Python object structures. The process of converting a Python object into a byte stream is called "pickling," while the reverse process of recreating the object from the byte stream is called "unpickling." This mechanism is similar to object serialization in other programming languages like Java. The pickle module is particularly useful for saving and loading complex data structures like lists, dictionaries, and even custom class instances.
  
  - Pickle files are binary files, and they are specific to Python. This means that data pickled in Python can only be reliably unpickled using Python. While this can be a limitation for cross-language compatibility, it makes pickle very efficient for storing and retrieving Python objects within Python applications.
  
  - It's important to note that the pickle module is not secure against maliciously constructed data. Unpickling data from untrusted sources can lead to arbitrary code execution. Therefore, it's crucial to only unpickle data that comes from a trusted source. If data integrity and security are paramount, consider using safer serialization formats like JSON, especially when dealing with external or untrusted data. 

In [None]:
# here i have been converting the dataframe objects into pickle file. df contains the dataframe.
excel_df = pd.read_excel("data_set/excel_file.xlsx")
excel_df.to_pickle("data_set/df_excel_pickle")

# the pickle files are serialized, so we can't open up th file without un-pickling the file 
# to unpickle the file, we must use the read_pickle method 
unpickled_file = pd.read_pickle("./data_set/df_excel_pickle")