# Reading Data from Files

In this part, you will learn how to read different types of data into pandas:

- Reading CSV files  
- Reading Excel files  
- Reading JSON files  
- Reading from URLs  
- Reading SQL tables  
- Handling delimiters, missing headers, and large files  
- Reading nested JSON  
- Reading Parquet and ZIP files  

These examples will help you load almost any dataset you encounter in real projects.


üü¶ 1. Import Libraries

In [1]:
import pandas as pd
from io import StringIO
import sqlite3

üü¶ 2. Reading CSV Files


CSV files are the most common data format.  

Below are different ways to load CSV data using `pd.read_csv()`.


In [2]:
csv_data = """
Name,Age,City
Alice,24,Toronto
Bob,19,Montreal
Charlie,22,Vancouver
"""

df_csv = pd.read_csv(StringIO(csv_data))
df_csv

Unnamed: 0,Name,Age,City
0,Alice,24,Toronto
1,Bob,19,Montreal
2,Charlie,22,Vancouver


Using a Custom Delimiter

In [3]:
csv_data_semicolon = """
Name;Age;City
Alice;24;Toronto
Bob;19;Montreal
"""

df_delim = pd.read_csv(StringIO(csv_data_semicolon), sep=";")
df_delim

Unnamed: 0,Name,Age,City
0,Alice,24,Toronto
1,Bob,19,Montreal


CSV Without a Header

In [4]:
csv_noheader = """
Alice,24,Toronto
Bob,19,Montreal
"""

df_no_header = pd.read_csv(StringIO(csv_noheader), header=None)
df_no_header

Unnamed: 0,0,1,2
0,Alice,24,Toronto
1,Bob,19,Montreal


Reading Only Specific Columns

In [5]:
df_subset = pd.read_csv(StringIO(csv_data), usecols=["Name", "City"])
df_subset

Unnamed: 0,Name,City
0,Alice,Toronto
1,Bob,Montreal
2,Charlie,Vancouver


Reading Large CSV in Chunks

In [7]:
chunk_data = """
col1,col2
1,10
2,20
3,30
4,40
"""

for chunk in pd.read_csv(StringIO(chunk_data), chunksize=2):
    print(chunk)
    print("")

   col1  col2
0     1    10
1     2    20

   col1  col2
2     3    30
3     4    40



üü¶ 3. Reading Excel Files

Use `pd.read_excel()` to load Excel `.xlsx` files.

You may need the `openpyxl` package.

Creating and Reading an Excel File

In [8]:
sample_excel = {
    "Product": ["Book", "Pen", "Notebook"],
    "Price": [12.5, 1.2, 4.75]
}

df_temp = pd.DataFrame(sample_excel)
df_temp.to_excel("sample.xlsx", index=False)

df_excel = pd.read_excel("sample.xlsx")
df_excel

Unnamed: 0,Product,Price
0,Book,12.5
1,Pen,1.2
2,Notebook,4.75


Reading from a Specific Sheet

In [9]:
# Create multi-sheet Excel for demonstration
with pd.ExcelWriter("multi_sheet.xlsx") as writer:
    df_temp.to_excel(writer, sheet_name="Sheet1", index=False)
    df_temp.to_excel(writer, sheet_name="Sheet2", index=False)

df_sheet2 = pd.read_excel("multi_sheet.xlsx", sheet_name="Sheet2")
df_sheet2

Unnamed: 0,Product,Price
0,Book,12.5
1,Pen,1.2
2,Notebook,4.75


üü¶ 4. Reading JSON Files

Pandas can read JSON:
- from files
- from strings
- from nested structures using `json_normalize`

Simple JSON

In [10]:
json_data = """
[
  {"Name": "Alice", "Age": 24},
  {"Name": "Bob", "Age": 19}
]
"""

df_json = pd.read_json(StringIO(json_data))
df_json

Unnamed: 0,Name,Age
0,Alice,24
1,Bob,19


Reading Nested JSON

In [11]:
nested_json = {
    "name": "Alice",
    "scores": {"math": 90, "science": 85}
}

df_nested = pd.json_normalize(nested_json)
df_nested

Unnamed: 0,name,scores.math,scores.science
0,Alice,90,85


üü¶ 5. Reading Data from URLs

If a dataset is publicly accessible, pandas can read it directly.

CSV from URL

In [12]:
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
df_url = pd.read_csv(url)
df_url.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


üü¶ 6. Reading from SQL Databases

Use `pd.read_sql()` with an SQL connection. 
 
Below is an example using SQLite.

SQLite Table

In [13]:
connection = sqlite3.connect(":memory:")

df_sample = pd.DataFrame({
    "id": [1, 2, 3],
    "value": ["A", "B", "C"]
})
df_sample.to_sql("sample_table", connection, index=False)

df_sql = pd.read_sql("SELECT * FROM sample_table", connection)
df_sql


Unnamed: 0,id,value
0,1,A
1,2,B
2,3,C


üü¶ 7. Reading Parquet Files

Parquet is a modern, fast, compressed columnar file format.

In [14]:
df_temp.to_parquet("sample.parquet")
df_parquet = pd.read_parquet("sample.parquet")
df_parquet

Unnamed: 0,Product,Price
0,Book,12.5
1,Pen,1.2
2,Notebook,4.75


üü¶ 8. Reading ZIP Files Directly

Pandas can read CSV files inside ZIP archives without extracting them.

In [16]:
import zipfile

# Create zip for demo
df_temp.to_csv("temp.csv", index=False)
with zipfile.ZipFile("data.zip", "w") as z:
    z.write("temp.csv")

df_zip = pd.read_csv("data.zip")
df_zip


Unnamed: 0,Product,Price
0,Book,12.5
1,Pen,1.2
2,Notebook,4.75


## üü¶ 9. Summary

In this section, you learned how to load data into pandas using many formats:

### üìÑ CSV Files
- Standard CSV reading  
- Custom delimiters (`sep=";"`)  
- Files without headers  
- Loading specific columns (`usecols`)  
- Reading large files in chunks  

### üìä Excel Files
- Reading `.xlsx`  
- Reading from a specific sheet  

### üü§ JSON Files
- Standard JSON  
- Nested JSON with `json_normalize`  

### üåê URLs
- Reading CSV files directly from online sources  

### üóÑ SQL Databases
- Reading tables with `pd.read_sql()`  

### üì¶ Advanced Formats
- Parquet files  
- ZIP archives  

You can now load almost any file format into pandas 


# great job! üéâ  
