In [1]:
import pandas as pd

## **Reading Files in Pandas**

Pandas provides various functions to read different file formats, including **CSV, TXT, JSON, and Excel**.  

---

## **1. Reading CSV Files**  

### **1.1 Basic CSV Reading**
The `read_csv()` function reads a CSV file and returns a DataFrame.

In [2]:
df = pd.read_csv(r"dummy data\countries of the world.csv")
df

Unnamed: 0,Country,Region
0,Afghanistan,ASIA (EX. NEAR EAST)
1,Albania,EASTERN EUROPE
2,Algeria,NORTHERN AFRICA
3,American Samoa,OCEANIA
4,Andorra,WESTERN EUROPE
...,...,...
222,West Bank,NEAR EAST
223,Western Sahara,NORTHERN AFRICA
224,Yemen,NEAR EAST
225,Zambia,SUB-SAHARAN AFRICA


### **1.2 Reading CSV Without Headers**
If the CSV file has no headers, we can manually assign column names using the `names` parameter.

In [3]:
df = pd.read_csv(r"dummy data\countries of the world.csv", 
                 header=None, names=['country', 'region'])
df

Unnamed: 0,country,region
0,Country,Region
1,Afghanistan,ASIA (EX. NEAR EAST)
2,Albania,EASTERN EUROPE
3,Algeria,NORTHERN AFRICA
4,American Samoa,OCEANIA
...,...,...
223,West Bank,NEAR EAST
224,Western Sahara,NORTHERN AFRICA
225,Yemen,NEAR EAST
226,Zambia,SUB-SAHARAN AFRICA


---
## **2. Reading TXT Files**  

### **2.1 Reading a TXT File as CSV**
A text file can be read as a CSV file using the `read_csv()` function.


In [4]:
df = pd.read_csv(r"dummy data\countries of the world.txt")
df

Unnamed: 0,Country\tRegion
0,Afghanistan \tASIA (EX. NEAR EAST)
1,Albania \tEASTERN EUROPE
2,Algeria \tNORTHERN AFRICA
3,American Samoa \tOCEANIA ...
4,Andorra \tWESTERN EUROPE
...,...
222,West Bank \tNEAR EAST
223,Western Sahara \tNORTHERN AFRICA ...
224,Yemen \tNEAR EAST
225,Zambia \tSUB-SAHARAN AFRICA


### **2.2 Reading a TXT File with a Specified Separator**
If the text file is **tab-separated**, we can specify the separator using the `sep` parameter.

In [5]:
df = pd.read_csv(r"dummy data\countries of the world.txt", sep='\t')
df

Unnamed: 0,Country,Region
0,Afghanistan,ASIA (EX. NEAR EAST)
1,Albania,EASTERN EUROPE
2,Algeria,NORTHERN AFRICA
3,American Samoa,OCEANIA
4,Andorra,WESTERN EUROPE
...,...,...
222,West Bank,NEAR EAST
223,Western Sahara,NORTHERN AFRICA
224,Yemen,NEAR EAST
225,Zambia,SUB-SAHARAN AFRICA


### **2.3 Using `read_table()` to Read a TXT File**
The `read_table()` function is similar to `read_csv()`, but assumes a **tab separator** by default.

In [6]:
df = pd.read_table(r"dummy data\countries of the world.txt")
df

Unnamed: 0,Country,Region
0,Afghanistan,ASIA (EX. NEAR EAST)
1,Albania,EASTERN EUROPE
2,Algeria,NORTHERN AFRICA
3,American Samoa,OCEANIA
4,Andorra,WESTERN EUROPE
...,...,...
222,West Bank,NEAR EAST
223,Western Sahara,NORTHERN AFRICA
224,Yemen,NEAR EAST
225,Zambia,SUB-SAHARAN AFRICA


---

## **3. Reading JSON Files**  

### **3.1 Reading a JSON File**
JSON files can be read into a DataFrame using the `read_json()` function.

In [7]:
df = pd.read_json(r"dummy data\json_sample.json")
df

Unnamed: 0,12 Strong,A Fantastic Woman (Una Mujer Fantástica),All The Money In The World,Bilal: A New Breed Of Hero,Call Me By Your Name,Darkest Hour,Den Of Thieves,Ferdinand,Fifty Shades Freed,Film Stars Don'T Die In Liverpool,...,The 15:17 To Paris,The Commuter,The Disaster Artist,The Greatest Showman,The Insult (L'Insulte),The Post,The Shape Of Water,"Three Billboards Outside Ebbing, Missouri",Till The End Of The World,Winchester
0,"{'Genre': 'Action', 'Gross': '$453,173', 'IMDB...","{'popcornscore': 83, 'rating': 'R', 'tomatosco...","{'popcornscore': 71, 'rating': 'R', 'tomatosco...","{'popcornscore': 91, 'rating': 'PG13', 'tomato...","{'popcornscore': 87, 'rating': 'R', 'tomatosco...","{'popcornscore': 84, 'rating': 'PG13', 'tomato...","{'Genre': 'Action', 'Gross': '$491,898', 'IMDB...","{'popcornscore': 49, 'rating': 'PG', 'tomatosc...","{'Genre': 'Drama', 'Gross': 'unknown', 'IMDB M...","{'popcornscore': 69, 'rating': 'R', 'tomatosco...",...,"{'Genre': 'Drama', 'Gross': 'unknown', 'IMDB M...","{'popcornscore': 48, 'rating': 'PG13', 'tomato...","{'popcornscore': 89, 'rating': 'R', 'tomatosco...","{'Genre': 'Biography', 'Gross': '$627,248', 'I...","{'popcornscore': 86, 'rating': 'R', 'tomatosco...","{'Genre': 'Biography', 'Gross': '$463,228', 'I...","{'Genre': 'Adventure', 'Gross': '$448,287', 'I...","{'popcornscore': 87, 'rating': 'R', 'tomatosco...","{'popcornscore': -1, 'rating': 'NR', 'tomatosc...","{'Genre': 'Biography', 'Gross': '$696,786', 'I..."


---

## **4. Reading Excel Files**  

### **4.1 Reading an Entire Excel File**
The `read_excel()` function reads an **Excel file** into a DataFrame.

In [8]:
df = pd.read_excel(r"dummy data\world_population_excel_workbook.xlsx")
df

Unnamed: 0,Rank,CCA3,Country,Capital,Continent,2022 Population,2020 Population,2015 Population,2010 Population,2000 Population,1990 Population,1980 Population,1970 Population,Area (kmÂ²),Density (per kmÂ²),Growth Rate,World Population Percentage
0,36,AFG,Afghanistan,Kabul,Asia,41128771,38972230,33753499,28189672,19542982,10694796,12486631,10752971,652230,63.0587,1.0257,0.52
1,138,ALB,Albania,Tirana,Europe,2842321,2866849,2882481,2913399,3182021,3295066,2941651,2324731,28748,98.8702,0.9957,0.04
2,34,DZA,Algeria,Algiers,Africa,44903225,43451666,39543154,35856344,30774621,25518074,18739378,13795915,2381741,18.8531,1.0164,0.56
3,213,ASM,American Samoa,Pago Pago,Oceania,44273,46189,51368,54849,58230,47818,32886,27075,199,222.4774,0.9831,0.00
4,203,AND,Andorra,Andorra la Vella,Europe,79824,77700,71746,71519,66097,53569,35611,19860,468,170.5641,1.0100,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
229,226,WLF,Wallis and Futuna,Mata-Utu,Oceania,11572,11655,12182,13142,14723,13454,11315,9377,142,81.4930,0.9953,0.00
230,172,ESH,Western Sahara,El AaiÃºn,Africa,575986,556048,491824,413296,270375,178529,116775,76371,266000,2.1654,1.0184,0.01
231,46,YEM,Yemen,Sanaa,Asia,33696614,32284046,28516545,24743946,18628700,13375121,9204938,6843607,527968,63.8232,1.0217,0.42
232,63,ZMB,Zambia,Lusaka,Africa,20017675,18927715,16248230,13792086,9891136,7686401,5720438,4281671,752612,26.5976,1.0280,0.25


### **4.2 Reading a Specific Sheet from an Excel File**
If an Excel file has **multiple sheets**, we can specify which sheet to read using the `sheet_name` parameter.

In [9]:
df = pd.read_excel(r"dummy data\world_population_excel_workbook.xlsx", 
                   sheet_name='Sheet1')
df

Unnamed: 0,Rank,CCA3,Country,Capital
0,36,AFG,Afghanistan,Kabul
1,138,ALB,Albania,Tirana
2,34,DZA,Algeria,Algiers
3,213,ASM,American Samoa,Pago Pago
4,203,AND,Andorra,Andorra la Vella
...,...,...,...,...
229,226,WLF,Wallis and Futuna,Mata-Utu
230,172,ESH,Western Sahara,El AaiÃºn
231,46,YEM,Yemen,Sanaa
232,63,ZMB,Zambia,Lusaka


---

## **5. Exploring the Dataset**  

### **5.1 Setting Display Options**
By default, Pandas limits the number of rows and columns displayed.  
We can adjust these settings to improve visibility.

In [10]:
pd.set_option('display.max.rows', 235)   # Display up to 235 rows
pd.set_option('display.max.columns', 40) # Display up to 40 columns

### **5.2 Displaying Dataset Information**
The `info()` function provides an overview of the dataset, including **column names, data types, and missing values**.

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 234 entries, 0 to 233
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Rank     234 non-null    int64 
 1   CCA3     234 non-null    object
 2   Country  234 non-null    object
 3   Capital  234 non-null    object
dtypes: int64(1), object(3)
memory usage: 7.4+ KB


### **5.3 Checking Dataset Shape**
The `shape` attribute returns the **number of rows and columns** in the dataset.

In [12]:
df.shape

(234, 4)

### **5.4 Viewing the First Few Rows**
The `head()` function returns the **first 5 rows** of the dataset (by default).  


In [13]:
df.head()

Unnamed: 0,Rank,CCA3,Country,Capital
0,36,AFG,Afghanistan,Kabul
1,138,ALB,Albania,Tirana
2,34,DZA,Algeria,Algiers
3,213,ASM,American Samoa,Pago Pago
4,203,AND,Andorra,Andorra la Vella


### **5.5 Viewing the Last Few Rows**
The `tail()` function returns the **last 5 rows** of the dataset (by default).


In [14]:
df.tail()

Unnamed: 0,Rank,CCA3,Country,Capital
229,226,WLF,Wallis and Futuna,Mata-Utu
230,172,ESH,Western Sahara,El AaiÃºn
231,46,YEM,Yemen,Sanaa
232,63,ZMB,Zambia,Lusaka
233,74,ZWE,Zimbabwe,Harare
