### Reading Data From Different Sources


You're reading JSON data into a Pandas DataFrame.

1.  **`import pandas as pd`**: Imports the Pandas library, giving you tools for working with DataFrames.
2.  **`from io import StringIO`**: Imports `StringIO`, which lets you treat a string like a file.
3.  **`Data = '{"employee_name": "James", "email": "james@gmail.com", "job_profile": [{"title1":"Team Lead", "title2":"Sr. Developer"}]}'`**: This is a JSON string containing information about an employee. Notice `job_profile` is a list containing a dictionary.
4.  **`df=pd.read_json(StringIO(Data))`**:
    * `StringIO(Data)`: Makes the JSON string behave like a readable file.
    * `pd.read_json(...)`: Reads the JSON data from the "file" and creates a Pandas DataFrame named `df`.

In short, this code takes a JSON string and turns it into a table-like structure (DataFrame) that you can easily work with using Pandas. The `job_profile` will likely be a column containing lists of dictionaries.


`StringIO` is a class in Python's `io` module that allows you to treat a string as a file-like object.

`StringIO` provides a convenient way to work with string data as if it were a file, especially when used with libraries like pandas that have functions designed to read from files or file-like objects. It avoids the need for temporary file creation and can make your code cleaner and more efficient when dealing with in-memory string data. Using `StringIO` ensures that you are providing the input in the format that `pd.read_json()` expects for reading from a string.

In [20]:
import pandas as pd
from io import StringIO

Data = '{"employee_name": "James", "email": "james@gmail.com", "job_profile": [{"title1":"Team Lead", "title2":"Sr. Developer"}]}'
print(type(Data))
df=pd.read_json(StringIO(Data))

<class 'str'>


In [21]:
df

Unnamed: 0,employee_name,email,job_profile
0,James,james@gmail.com,"{'title1': 'Team Lead', 'title2': 'Sr. Develop..."


`df.to_json()` converts your entire Pandas DataFrame `df` into a JSON (JavaScript Object Notation) string. This format is commonly used for data transmission and storage because it's lightweight and readable by many programming languages.

In [6]:
df.to_json()

'{"employee_name":{"0":"James"},"email":{"0":"james@gmail.com"},"job_profile":{"0":{"title1":"Team Lead","title2":"Sr. Developer"}}}'

In [None]:
# display the docstring for the to_json() function, which clearly lists and describes 
# all the possible values for the orient parameter.
help(pd.DataFrame.to_json)

Help on function to_json in module pandas.core.generic:

to_json(self, path_or_buf: 'FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None' = None, *, orient: "Literal['split', 'records', 'index', 'table', 'columns', 'values'] | None" = None, date_format: 'str | None' = None, double_precision: 'int' = 10, force_ascii: 'bool_t' = True, date_unit: 'TimeUnit' = 'ms', default_handler: 'Callable[[Any], JSONSerializable] | None' = None, lines: 'bool_t' = False, compression: 'CompressionOptions' = 'infer', index: 'bool_t | None' = None, indent: 'int | None' = None, storage_options: 'StorageOptions | None' = None, mode: "Literal['a', 'w']" = 'w') -> 'str | None'
    Convert the object to a JSON string.

    Note NaN's and None will be converted to null and datetime objects
    will be converted to UNIX timestamps.

    Parameters
    ----------
    path_or_buf : str, path object, file-like object, or None, default None
        String, path object (implementing os.PathLike[str]), or file-like

`df.to_json(orient='index')` converts your DataFrame `df` into a JSON string. The `orient='index'` part means each **row** of your DataFrame becomes a separate JSON object. The keys of these objects are the **index labels** of the rows, and the values are dictionaries containing the column names and their corresponding row values.

**Example:**

If your DataFrame `df` looks like this:

```
   col1  col2
a     1     3
b     2     4
```

Then `df.to_json(orient='index')` would produce a JSON string like this:

```json
{"a":{"col1":1,"col2":3},"b":{"col1":2,"col2":4}}
```

In [22]:
df.to_json(orient='index')

'{"0":{"employee_name":"James","email":"james@gmail.com","job_profile":{"title1":"Team Lead","title2":"Sr. Developer"}}}'

`df.to_json(orient='records')` converts your DataFrame `df` into a JSON string where each row becomes a separate JSON object (a dictionary). The `records` orientation arranges the data as a list of these row-objects.

**Short, simple code example:**

```python
import pandas as pd

data = {'col1': [1, 2], 'col2': ['a', 'b']}
df = pd.DataFrame(data)

json_output = df.to_json(orient='records')
print(json_output)
```

**Output:**

```json
[{"col1":1,"col2":"a"},{"col1":2,"col2":"b"}]
```

Each dictionary in the list represents a row from your DataFrame, with keys as column names and values as the row data.

In [5]:
df.to_json(orient='records')

'[{"employee_name":"James","email":"james@gmail.com","job_profile":{"title1":"Team Lead","title2":"Sr. Developer"}}]'

## Read Data from Internet URLs

In [26]:
df=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data",header=None)
# df=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data")

In [25]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [27]:
# Write object to a comma-separated values (csv) file.
df.to_csv("wine.csv")

In [28]:
!pip install lxml



In [29]:
!pip install html5lib
!pip install beautifulsoup4

Collecting html5lib
  Downloading html5lib-1.1-py2.py3-none-any.whl.metadata (16 kB)
Downloading html5lib-1.1-py2.py3-none-any.whl (112 kB)
Installing collected packages: html5lib
Successfully installed html5lib-1.1


In [30]:
url="https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/"

df=pd.read_html(url)

In [31]:
df[0]

Unnamed: 0,Bank Name,City,State,Cert,Acquiring Institution,Closing Date,Fund Sort ascending
0,Pulaski Savings Bank,Chicago,Illinois,28611,Millennium Bank,"January 17, 2025",10548
1,The First National Bank of Lindsay,Lindsay,Oklahoma,4134,"First Bank & Trust Co., Duncan, OK","October 18, 2024",10547
2,Republic First Bank dba Republic Bank,Philadelphia,Pennsylvania,27332,"Fulton Bank, National Association","April 26, 2024",10546
3,Citizens Bank,Sac City,Iowa,8758,Iowa Trust & Savings Bank,"November 3, 2023",10545
4,Heartland Tri-State Bank,Elkhart,Kansas,25851,"Dream First Bank, N.A.","July 28, 2023",10544
5,First Republic Bank,San Francisco,California,59017,"JPMorgan Chase Bank, N.A.","May 1, 2023",10543
6,Signature Bank,New York,New York,57053,"Flagstar Bank, N.A.","March 12, 2023",10540
7,Silicon Valley Bank,Santa Clara,California,24735,First Citizens Bank & Trust Company,"March 10, 2023",10539
8,Almena State Bank,Almena,Kansas,15426,Equity Bank,"October 23, 2020",10538
9,First City Bank of Florida,Fort Walton Beach,Florida,16748,"United Fidelity Bank, fsb","October 16, 2020",10537


In [33]:
url="https://en.wikipedia.org/wiki/Mobile_country_code"
# pd.read_html(url,match="Country",header=0)[0]

pd.read_html(url,match="91",header=0)[0]

Unnamed: 0,MCC,MNC,Brand,Operator,Status,Bands (MHz),References and notes
0,901,1,,Webbing,Unknown,MVNO,Former ICO Satellite Management[51][52]
1,901,2,,GlobalmatiX AG,Unknown,Unknown,Former Sense Communications International; veh...
2,901,3,Iridium,,Operational,Satellite,
3,901,4,,BBIX Singapore Pte. Ltd.,Unknown,Unknown,Former Globalstar[54]
4,901,5,,Thuraya RMSS Network,Operational,Satellite,
...,...,...,...,...,...,...,...
99,902,1,,MulteFire Alliance,Operational,LTE,[6][126]
100,991,1,,World's Global Telecom,Not operational,Unknown,temporarily assigned until 15 January 2021[104...
101,991,2,5G Croco,Orange S.A.,Not operational,5G,temporarily assigned until 6 August 2022[128][...
102,991,3,,Halys SAS,Not operational,Unknown,temporary assignment for trial until 5 April 2...


In [34]:
!pip install openpyxl



In [35]:
df_excel=pd.read_excel('data.xlsx')
df_excel

Unnamed: 0,Name,Age
0,Karan,32
1,Jack,34
2,John,31


`df_excel.to_pickle('df_excel')` saves the pandas DataFrame named `df_excel` to a file named 'df_excel' in the pickle format.

**In essence:**

It performs **serialization** of the DataFrame into a binary file, preserving its structure and data types. You can later load it back into a DataFrame using `pd.read_pickle('df_excel')`.

**Use it when:**

* You want to save a DataFrame for **faster reading** in a future Python session compared to text-based formats like CSV.
* You need to **preserve the data types** and structure of the DataFrame exactly as they are.

The resulting file 'df_excel' will be in a binary format, not easily human-readable.

In [37]:
df_excel.to_pickle('df_excel') # file saved but not direcly readable

In [38]:
# read pickle file
pd.read_pickle('df_excel')

Unnamed: 0,Name,Age
0,Karan,32
1,Jack,34
2,John,31
