### StringIO is a class in Python’s io module.
It lets you treat a string as a file object.

In simple words:
📄 If you have data in a string (instead of an actual file), StringIO makes it behave like a file so functions like pd.read_csv() or pd.read_json() can read it.
Many Pandas functions expect a file object or file path.
If your data is already in a string format (e.g., coming from an API or copy-pasted JSON), you can use StringIO to fake it as a file.

In [22]:
import pandas as pd
from io import StringIO

data = """
[
  {
    "id": 1,
    "name": "Alice",
    "age": 28,
    "email": "alice@example.com",
    "street": "123 Main St",
    "city": "New York",
    "zipcode": "10001"
  }
]
"""

data_file = StringIO(data)
df=pd.read_json(data_file)

df

Unnamed: 0,id,name,age,email,street,city,zipcode
0,1,Alice,28,alice@example.com,123 Main St,New York,10001


## df.to_json()
This Pandas method converts a DataFrame into JSON format.

orient="index" / "record" (Defines how the DataFrame is serialized into JSON)

index(default)	->    Whether to include the index as a key in the JSON (default = True) and 
record    -> JSON is a list of dictionaries, good for API

In [None]:
df.to_json()   ## default -> orient = "index"

'{"id":{"0":1},"name":{"0":"Alice"},"age":{"0":28},"email":{"0":"alice@example.com"},"street":{"0":"123 Main St"},"city":{"0":"New York"},"zipcode":{"0":10001}}'

In [24]:
df.to_json(orient="index")

'{"0":{"id":1,"name":"Alice","age":28,"email":"alice@example.com","street":"123 Main St","city":"New York","zipcode":10001}}'

In [21]:
df.to_json(orient="records")

'[{"id":1,"name":"Alice","age":28,"email":"alice@example.com","address":{"street":"123 Main St","city":"New York","zipcode":"10001"}}]'

### pd.json_normalize()
This function flattens nested JSON data into a flat table (DataFrame).

It’s especially useful when your JSON has:
✅ Nested lists (arrays)
✅ Nested dictionaries (objects)

In [16]:

data=[
  {
    "id": 1,
    "name": "Alice",
    "age": 28,
    "email": "alice@example.com",
    "orders": [
      {"order_id": 101, "amount": 250.5, "date": "2025-07-01"},
      {"order_id": 102, "amount": 99.9, "date": "2025-07-05"}
    ],
    "address": {
      "street": "123 Main St",
      "city": "New York",
      "zipcode": "10001"
    }
  },
  {
    "id": 2,
    "name": "Bob",
    "age": 35,
    "email": "bob@example.com",
    "orders": [
      {"order_id": 103, "amount": 150.0, "date": "2025-07-03"},
      {"order_id": 104, "amount": 300.0, "date": "2025-07-10"},
      {"order_id": 105, "amount": 75.5, "date": "2025-07-12"}
    ],
    "address": {
      "street": "456 Park Ave",
      "city": "Chicago",
      "zipcode": "60614"
    }
  },
  {
    "id": 3,
    "name": "Charlie",
    "age": 40,
    "email": "charlie@example.com",
    "orders": [
      {"order_id": 106, "amount": 500.0, "date": "2025-07-08"}
    ],
    "address": {
      "street": "789 Elm St",
      "city": "Los Angeles",
      "zipcode": "90001"
    }
  },
  {
    "id": 4,
    "name": "Diana",
    "age": 30,
    "email": "diana@example.com",
    "orders": [],
    "address": {
      "street": "321 Oak St",
      "city": "Houston",
      "zipcode": "77002"
    }
  }
]

pd.json_normalize(data)

Unnamed: 0,id,name,age,email,orders,address.street,address.city,address.zipcode
0,1,Alice,28,alice@example.com,"[{'order_id': 101, 'amount': 250.5, 'date': '2...",123 Main St,New York,10001
1,2,Bob,35,bob@example.com,"[{'order_id': 103, 'amount': 150.0, 'date': '2...",456 Park Ave,Chicago,60614
2,3,Charlie,40,charlie@example.com,"[{'order_id': 106, 'amount': 500.0, 'date': '2...",789 Elm St,Los Angeles,90001
3,4,Diana,30,diana@example.com,[],321 Oak St,Houston,77002


In [27]:
### read data from url
url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data"
df=pd.read_csv(url,header=None)
df.head(8)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735
5,1,14.2,1.76,2.45,15.2,112,3.27,3.39,0.34,1.97,6.75,1.05,2.85,1450
6,1,14.39,1.87,2.45,14.6,96,2.5,2.52,0.3,1.98,5.25,1.02,3.58,1290
7,1,14.06,2.15,2.61,17.6,121,2.6,2.51,0.31,1.25,5.05,1.06,3.58,1295


In [28]:
df.to_csv("wine.csv")   ## store data in csv

### pd.read_html() Overview
This function reads HTML tables into Pandas DataFrames.
##  url=""
✅ If you pass a URL, Pandas will fetch the webpage and extract all tables into a list of DataFrames.
## match=""
✅ Filters only the table(s) that match a specific string or regular expression in their content.
## header=
✅ Controls which row(s) to use as the column headers. 0 (default)

In [None]:
### read html file  
url="https://www.fdic.gov/bank-failures/failed-bank-list"
df=pd.read_html(url,match="Bank Name",header=0)
df[0].head(3)

Unnamed: 0,Bank Name,City,State,Cert,Acquiring Institution,Closing Date,Fund Sort ascending
0,The Santa Anna National Bank,Santa Anna,Texas,5520,Coleman County State Bank,"June 27, 2025",10549
1,Pulaski Savings Bank,Chicago,Illinois,28611,Millennium Bank,"January 17, 2025",10548
2,The First National Bank of Lindsay,Lindsay,Oklahoma,4134,"First Bank & Trust Co., Duncan, OK","October 18, 2024",10547


In [44]:
### read html file  
url="https://en.wikipedia.org/wiki/Mobile_country_code#National_operators"
df=pd.read_html(url)
df[0]

Unnamed: 0,MCC,MNC,Brand,Operator,Status,Bands (MHz),References and notes
0,1,1,TEST,Test network,Operational,any,
1,1,1,TEST,Test network,Operational,any,
2,999,99,,Internal use,Operational,any,"Internal use in private networks, no roaming[6]"
3,999,999,,Internal use,Operational,any,"Internal use in private networks, no roaming[6]"


In [45]:
## read data from excel file
df_excel=pd.read_excel("Book1.xlsx")
df_excel

Unnamed: 0,Name,Age
0,Susovan Paul,23
1,Pupai Paul,21
2,Taniya Dey,21
3,MimI Dey,20


## What is Pickle?
✅ Pickle is like a "Save and Load" button for Python objects.

📝 You can save your Python data (lists, dictionaries, DataFrames, ML models) into a file.

📂 Later, you can load it back exactly how it was.

Think of it like this 👇

📦 Your Python object ➡️ 🗂️ Save as a Pickle file (.pkl) ➡️ 📦 Load back when needed

🟢 Why use Pickle?
✅ So you don’t need to recreate the same data or model every time.

✅ You can:

Save your data once and reload it later.

Save your ML models so you don’t have to retrain them.

✅ Faster and more Python-friendly than CSV or JSON.

In [46]:
df_excel.to_pickle("df_excel")  ## store into local

In [48]:
df=pd.read_pickle("df_excel")
df

Unnamed: 0,Name,Age
0,Susovan Paul,23
1,Pupai Paul,21
2,Taniya Dey,21
3,MimI Dey,20
