### Reading Data From Different Sources


In [22]:
import pandas as pd
from io import StringIO
Data = '{"employee_name": "James", "email": "james@gmail.com", "job_profile": [{"title1":"Team Lead", "title2":"Sr. Developer"}]}'
df=pd.read_json(StringIO(Data))

The `StringIO` module is used to treat a string as a file-like object, enabling functions like `pd.read_json()` to read data from it as if it were a file. 

In your code:

```python
from io import StringIO
Data = '{"employee_name": "James", "email": "james@gmail.com", "job_profile": [{"title1":"Team Lead", "title2":"Sr. Developer"}]}'
df = pd.read_json(StringIO(Data))
```

- `StringIO(Data)` converts the `Data` string into a file-like object that `pd.read_json()` can read from.
- `pd.read_json()` can then parse this string as if it were JSON data, converting it into a DataFrame.

In summary, `StringIO` allows you to simulate file operations on in-memory string data.

### If Data were a JSON object instead of a string, you could directly pass the object to pd.DataFrame() instead of using StringIO or read_json.

### In a JSON file ,the outermost structure should be a json object and not a json string

Great question! You're right in that functions like `pd.read_json()` or `pd.read_csv()` often expect a file path, URL, or file-like object. However, these functions are flexible and can handle not just actual files but also in-memory file-like objects.

### Here's why `StringIO` is used:
- **File-like objects**: Functions like `pd.read_json()` can work with **file-like objects**, which means they can read from something that behaves like a file (even if it's not a physical file). A **file-like object** is something that has the same methods as a file (e.g., `.read()`), allowing the function to interact with it the same way it would with an actual file.
  
- **StringIO** is a way to simulate a file in memory. It provides an in-memory file object that behaves just like a file, allowing you to pass in a string (which could represent JSON data) instead of reading from a physical file on disk.

### Example use case:
Normally, when you load a file, you do something like:

```python
df = pd.read_json('path/to/your/file.json')
```

But when the data is already in a string (as in your example), and you want to **simulate reading from a file**, you can use `StringIO`:

```python
from io import StringIO
Data = '{"employee_name": "James", "email": "james@gmail.com", "job_profile": [{"title1":"Team Lead", "title2":"Sr. Developer"}]}'
df = pd.read_json(StringIO(Data))  # Treats the string as if it were a file
```

Here, `StringIO(Data)` gives you a **file-like object** containing the JSON string, so `pd.read_json()` can treat it like it's reading from a file.

### Why do you need this?

- **Convenience**: Sometimes, you have data as a string (e.g., from an API or dynamic source) and don't want to write it to disk as a file just to read it. You can directly parse it using `StringIO`.
- **Efficiency**: It's often more efficient to handle data in-memory rather than writing it to disk, especially for temporary or dynamic data.

### Summary:
The reason you use `StringIO` here is to **simulate a file** in memory so that functions like `pd.read_json()` can handle it just like they would with a physical file or URL. This allows you to read the data from a string without having to write it to a file first.

In [23]:
df

Unnamed: 0,employee_name,email,job_profile
0,James,james@gmail.com,"{'title1': 'Team Lead', 'title2': 'Sr. Develop..."


In [24]:
df.to_json()

'{"employee_name":{"0":"James"},"email":{"0":"james@gmail.com"},"job_profile":{"0":{"title1":"Team Lead","title2":"Sr. Developer"}}}'

In [25]:
print(type(df.to_json()))

<class 'str'>


In [26]:
df.to_json(orient='index')

'{"0":{"employee_name":"James","email":"james@gmail.com","job_profile":{"title1":"Team Lead","title2":"Sr. Developer"}}}'

In [27]:
df.to_json(orient='records')

'[{"employee_name":"James","email":"james@gmail.com","job_profile":{"title1":"Team Lead","title2":"Sr. Developer"}}]'

In [42]:
df=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data",header=None)

In [43]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [44]:
df.to_csv("wine.csv")

In [31]:
!pip install lxml



In [32]:
!pip install html5lib
!pip install beautifulsoup4



In [33]:
url="https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/"

#Used to read HTMl files with tables in it thorugh the td tag
df=pd.read_html(url)

In [34]:
df#list with table as the only element

[                               Bank Name               City          State  \
 0     The First National Bank of Lindsay            Lindsay       Oklahoma   
 1  Republic First Bank dba Republic Bank       Philadelphia   Pennsylvania   
 2                          Citizens Bank           Sac City           Iowa   
 3               Heartland Tri-State Bank            Elkhart         Kansas   
 4                    First Republic Bank      San Francisco     California   
 5                         Signature Bank           New York       New York   
 6                    Silicon Valley Bank        Santa Clara     California   
 7                      Almena State Bank             Almena         Kansas   
 8             First City Bank of Florida  Fort Walton Beach        Florida   
 9                   The First State Bank      Barboursville  West Virginia   
 
     Cert                 Aquiring Institution      Closing Date  \
 0   4134   First Bank & Trust Co., Duncan, OK  October 18, 2

In [35]:
df[0]

Unnamed: 0,Bank Name,City,State,Cert,Aquiring Institution,Closing Date,Fund Sort ascending
0,The First National Bank of Lindsay,Lindsay,Oklahoma,4134,"First Bank & Trust Co., Duncan, OK","October 18, 2024",10547
1,Republic First Bank dba Republic Bank,Philadelphia,Pennsylvania,27332,"Fulton Bank, National Association","April 26, 2024",10546
2,Citizens Bank,Sac City,Iowa,8758,Iowa Trust & Savings Bank,"November 3, 2023",10545
3,Heartland Tri-State Bank,Elkhart,Kansas,25851,"Dream First Bank, N.A.","July 28, 2023",10544
4,First Republic Bank,San Francisco,California,59017,"JPMorgan Chase Bank, N.A.","May 1, 2023",10543
5,Signature Bank,New York,New York,57053,"Flagstar Bank, N.A.","March 12, 2023",10540
6,Silicon Valley Bank,Santa Clara,California,24735,First Citizens Bank & Trust Company,"March 10, 2023",10539
7,Almena State Bank,Almena,Kansas,15426,Equity Bank,"October 23, 2020",10538
8,First City Bank of Florida,Fort Walton Beach,Florida,16748,"United Fidelity Bank, fsb","October 16, 2020",10537
9,The First State Bank,Barboursville,West Virginia,14361,"MVB Bank, Inc.","April 3, 2020",10536


In [36]:
url="https://en.wikipedia.org/wiki/Mobile_country_code"
pd.read_html(url,match="Country",header=0)[0]# reads only thiose tables whicc have a column with the name country(retuns list)


Unnamed: 0,Mobile country code,Country,ISO 3166,Mobile network codes,National MNC authority,Remarks
0,289,A Abkhazia,GE-AB,List of mobile network codes in Abkhazia,,MCC is not listed by ITU
1,412,Afghanistan,AF,List of mobile network codes in Afghanistan,,
2,276,Albania,AL,List of mobile network codes in Albania,,
3,603,Algeria,DZ,List of mobile network codes in Algeria,,
4,544,American Samoa (United States of America),AS,List of mobile network codes in American Samoa,,
...,...,...,...,...,...,...
247,452,Vietnam,VN,List of mobile network codes in the Vietnam,,
248,543,W Wallis and Futuna,WF,List of mobile network codes in Wallis and Futuna,,
249,421,Y Yemen,YE,List of mobile network codes in the Yemen,,
250,645,Z Zambia,ZM,List of mobile network codes in Zambia,,


### In the pd.read_html() function, the header=0 argument specifies that the first row of the HTML table should be used as the column headers when reading the table into a DataFrame.

In [37]:
!pip install openpyxl



In [46]:
df_excel=pd.read_excel('data.xlsx')
df_excel.head()

Unnamed: 0,Name,Age
0,Krish,32
1,Jack,34
2,John,31


In [39]:
df_excel.to_pickle('df_excel')

In [40]:
pd.read_pickle('df_excel')

Unnamed: 0,Name,Age
0,Krish,32
1,Jack,34
2,John,31
