# Reading Files with Pandas

This notebook demonstrates various methods for reading different file formats using the Pandas library in Python. We'll cover CSV, tab-separated text files, JSON, and Excel files.

## Introduction to Pandas
Pandas is a powerful open-source Python library built on top of NumPy, designed for data manipulation and analysis. It provides two primary data structures: Series (1-dimensional) and DataFrame (2-dimensional), which are essential for handling structured data efficiently. Pandas excels in tasks like data cleaning, transformation, aggregation, and visualization, making it a cornerstone for data science workflows.

## Reading Files with Pandas
Pandas offers a comprehensive set of functions to read tabular data from various file formats. Below, we explore different methods for reading CSV files and tab-separated text files, showcasing the flexibility of `pd.read_csv()` and `pd.read_table()`. These functions can handle different delimiters, encodings, and file structures with ease.

In [81]:
import pandas as pd

In [82]:
#reading a csv file with a default separator which is comma
pd.read_csv(r'E:/PortfolioProjects/Python Project/learning pandas/countries of the world.csv')

Unnamed: 0,Country,Region
0,Afghanistan,ASIA (EX. NEAR EAST)
1,Albania,EASTERN EUROPE
2,Algeria,NORTHERN AFRICA
3,American Samoa,OCEANIA
4,Andorra,WESTERN EUROPE
5,Angola,SUB-SAHARAN AFRICA
6,Anguilla,LATIN AMER. & CARIB
7,Antigua & Barbuda,LATIN AMER. & CARIB
8,Argentina,LATIN AMER. & CARIB
9,Armenia,C.W. OF IND. STATES


In [83]:
#reading a txt file with a tab separator
pd.read_csv(r'E:/PortfolioProjects/Python Project/learning pandas/countries of the world.txt', sep='\t')

Unnamed: 0,Country,Region
0,Afghanistan,ASIA (EX. NEAR EAST)
1,Albania,EASTERN EUROPE
2,Algeria,NORTHERN AFRICA
3,American Samoa,OCEANIA
4,Andorra,WESTERN EUROPE
5,Angola,SUB-SAHARAN AFRICA
6,Anguilla,LATIN AMER. & CARIB
7,Antigua & Barbuda,LATIN AMER. & CARIB
8,Argentina,LATIN AMER. & CARIB
9,Armenia,C.W. OF IND. STATES


## Using read_table
`pd.read_table()` is a versatile function for reading delimited text files. It defaults to tab-separated values but can handle any delimiter by specifying the `sep` parameter. This makes it particularly useful for various file formats beyond just tabs, such as custom-separated files. It's essentially a wrapper around `pd.read_csv()` with different default settings.

In [84]:
#reading a csv file with a custom separator which is comma
pd.read_table(r'E:/PortfolioProjects/Python Project/learning pandas/countries of the world.csv', sep=',')

Unnamed: 0,Country,Region
0,Afghanistan,ASIA (EX. NEAR EAST)
1,Albania,EASTERN EUROPE
2,Algeria,NORTHERN AFRICA
3,American Samoa,OCEANIA
4,Andorra,WESTERN EUROPE
5,Angola,SUB-SAHARAN AFRICA
6,Anguilla,LATIN AMER. & CARIB
7,Antigua & Barbuda,LATIN AMER. & CARIB
8,Argentina,LATIN AMER. & CARIB
9,Armenia,C.W. OF IND. STATES


In [85]:
#reading a txt file with a custom separator which is tab
pd.read_table(r'E:/PortfolioProjects/Python Project/learning pandas/countries of the world.txt', sep='\t')

Unnamed: 0,Country,Region
0,Afghanistan,ASIA (EX. NEAR EAST)
1,Albania,EASTERN EUROPE
2,Algeria,NORTHERN AFRICA
3,American Samoa,OCEANIA
4,Andorra,WESTERN EUROPE
5,Angola,SUB-SAHARAN AFRICA
6,Anguilla,LATIN AMER. & CARIB
7,Antigua & Barbuda,LATIN AMER. & CARIB
8,Argentina,LATIN AMER. & CARIB
9,Armenia,C.W. OF IND. STATES


## Reading JSON Files
JSON (JavaScript Object Notation) is a popular format for data interchange. Pandas provides `pd.read_json()` to read JSON files and convert them into DataFrames. This function can handle various JSON structures, including nested objects and arrays. It's particularly useful for working with API responses or configuration files stored in JSON format.

In [86]:
pd.read_json(r'E:\PortfolioProjects\Python Project\learning pandas\json_sample.json')

Unnamed: 0,12 Strong,A Fantastic Woman (Una Mujer Fantástica),All The Money In The World,Bilal: A New Breed Of Hero,Call Me By Your Name,Darkest Hour,Den Of Thieves,Ferdinand,Fifty Shades Freed,Film Stars Don'T Die In Liverpool,Forever My Girl,Golden Exits,Hostiles,"I, Tonya",Insidious: The Last Key,Jumanji: Welcome To The Jungle,Mary And The Witch'S Flower,Maze Runner: The Death Cure,Molly'S Game,Paddington 2,Padmaavat,Permission,Peter Rabbit,Phantom Thread,Pitch Perfect 3,Proud Mary,Sanpo Suru Shinryakusha,Star Wars: The Last Jedi,The 15:17 To Paris,The Commuter,The Disaster Artist,The Greatest Showman,The Insult (L'Insulte),The Post,The Shape Of Water,"Three Billboards Outside Ebbing, Missouri",Till The End Of The World,Winchester
0,"{'Genre': 'Action', 'Gross': '$453,173', 'IMDB Metascore': '54', 'Popcorn Score': 72, 'Rating': ...","{'popcornscore': 83, 'rating': 'R', 'tomatoscore': 90}","{'popcornscore': 71, 'rating': 'R', 'tomatoscore': 77}","{'popcornscore': 91, 'rating': 'PG13', 'tomatoscore': 57}","{'popcornscore': 87, 'rating': 'R', 'tomatoscore': 96}","{'popcornscore': 84, 'rating': 'PG13', 'tomatoscore': 86}","{'Genre': 'Action', 'Gross': '$491,898', 'IMDB Metascore': '49', 'Popcorn Score': 69, 'Rating': ...","{'popcornscore': 49, 'rating': 'PG', 'tomatoscore': 71}","{'Genre': 'Drama', 'Gross': 'unknown', 'IMDB Metascore': '34', 'Popcorn Score': 'unknown', 'Rati...","{'popcornscore': 69, 'rating': 'R', 'tomatoscore': 78}","{'popcornscore': 91, 'rating': 'PG', 'tomatoscore': 21}","{'Genre': 'Drama', 'Gross': 'unknown', 'IMDB Metascore': '72', 'Popcorn Score': 'unknown', 'Rati...","{'Genre': 'Adventure', 'Gross': '$548,886', 'IMDB Metascore': '65', 'Popcorn Score': 71, 'Rating...","{'popcornscore': 89, 'rating': 'R', 'tomatoscore': 90}","{'popcornscore': 51, 'rating': 'PG13', 'tomatoscore': 32}","{'Genre': 'Action', 'Gross': '$760,867', 'IMDB Metascore': '58', 'Popcorn Score': 89, 'Rating': ...","{'popcornscore': 78, 'rating': 'PG', 'tomatoscore': 84}","{'Genre': 'Action', 'Gross': '$720,463', 'IMDB Metascore': '51', 'Popcorn Score': 71, 'Rating': ...","{'popcornscore': 85, 'rating': 'R', 'tomatoscore': 82}","{'Genre': 'Animation', 'Gross': '$184,414', 'IMDB Metascore': '88', 'Popcorn Score': 89, 'Rating...","{'popcornscore': 62, 'rating': 'NR', 'tomatoscore': 74}","{'Genre': 'Comedy', 'Gross': 'unknown', 'IMDB Metascore': '53', 'Popcorn Score': 'unknown', 'Rat...","{'Genre': 'Animation', 'Gross': 'unknown', 'IMDB Metascore': '56', 'Popcorn Score': 'unknown', '...","{'popcornscore': 68, 'rating': 'R', 'tomatoscore': 91}","{'popcornscore': 52, 'rating': 'PG13', 'tomatoscore': 31}","{'popcornscore': 56, 'rating': 'R', 'tomatoscore': 26}","{'Genre': 'Drama', 'Gross': 'unknown', 'IMDB Metascore': '65', 'Popcorn Score': 'unknown', 'Rati...","{'popcornscore': 48, 'rating': 'PG13', 'tomatoscore': 91}","{'Genre': 'Drama', 'Gross': 'unknown', 'IMDB Metascore': '52', 'Popcorn Score': 'unknown', 'Rati...","{'popcornscore': 48, 'rating': 'PG13', 'tomatoscore': 58}","{'popcornscore': 89, 'rating': 'R', 'tomatoscore': 91}","{'Genre': 'Biography', 'Gross': '$627,248', 'IMDB Metascore': '48', 'Popcorn Score': 90, 'Rating...","{'popcornscore': 86, 'rating': 'R', 'tomatoscore': 89}","{'Genre': 'Biography', 'Gross': '$463,228', 'IMDB Metascore': '83', 'Popcorn Score': 73, 'Rating...","{'Genre': 'Adventure', 'Gross': '$448,287', 'IMDB Metascore': '86', 'Popcorn Score': 78, 'Rating...","{'popcornscore': 87, 'rating': 'R', 'tomatoscore': 93}","{'popcornscore': -1, 'rating': 'NR', 'tomatoscore': None}","{'Genre': 'Biography', 'Gross': '$696,786', 'IMDB Metascore': '28', 'Popcorn Score': 40, 'Rating..."


# Reading Excel Files

In [87]:
pd.set_option('display.max_rows',234)
pd.set_option('display.max_columns',40)

In [88]:
pd.read_excel(r'E:\PortfolioProjects\Python Project\learning pandas\world_population_excel_workbook.xlsx')

Unnamed: 0,Rank,CCA3,Country,Capital,Continent,2022 Population,2020 Population,2015 Population,2010 Population,2000 Population,1990 Population,1980 Population,1970 Population,Area (kmÂ²),Density (per kmÂ²),Growth Rate,World Population Percentage
0,36,AFG,Afghanistan,Kabul,Asia,41128771,38972230,33753499,28189672,19542982,10694796,12486631,10752971,652230,63.0587,1.0257,0.52
1,138,ALB,Albania,Tirana,Europe,2842321,2866849,2882481,2913399,3182021,3295066,2941651,2324731,28748,98.8702,0.9957,0.04
2,34,DZA,Algeria,Algiers,Africa,44903225,43451666,39543154,35856344,30774621,25518074,18739378,13795915,2381741,18.8531,1.0164,0.56
3,213,ASM,American Samoa,Pago Pago,Oceania,44273,46189,51368,54849,58230,47818,32886,27075,199,222.4774,0.9831,0.0
4,203,AND,Andorra,Andorra la Vella,Europe,79824,77700,71746,71519,66097,53569,35611,19860,468,170.5641,1.01,0.0
5,42,AGO,Angola,Luanda,Africa,35588987,33428485,28127721,23364185,16394062,11828638,8330047,6029700,1246700,28.5466,1.0315,0.45
6,224,AIA,Anguilla,The Valley,North America,15857,15585,14525,13172,11047,8316,6560,6283,91,174.2527,1.0066,0.0
7,201,ATG,Antigua and Barbuda,Saint Johnâ€™s,North America,93763,92664,89941,85695,75055,63328,64888,64516,442,212.1335,1.0058,0.0
8,33,ARG,Argentina,Buenos Aires,South America,45510318,45036032,43257065,41100123,37070774,32637657,28024803,23842803,2780400,16.3683,1.0052,0.57
9,140,ARM,Armenia,Yerevan,Asia,2780469,2805608,2878595,2946293,3168523,3556539,3135123,2534377,29743,93.4831,0.9962,0.03


In [89]:
df2 = pd.read_excel(r'E:\PortfolioProjects\Python Project\learning pandas\world_population_excel_workbook.xlsx', sheet_name='Sheet1')

In [90]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 234 entries, 0 to 233
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Rank     234 non-null    int64 
 1   CCA3     234 non-null    object
 2   Country  234 non-null    object
 3   Capital  234 non-null    object
dtypes: int64(1), object(3)
memory usage: 7.4+ KB


In [91]:
df2.shape

(234, 4)

In [92]:
df2.head(10)

Unnamed: 0,Rank,CCA3,Country,Capital
0,36,AFG,Afghanistan,Kabul
1,138,ALB,Albania,Tirana
2,34,DZA,Algeria,Algiers
3,213,ASM,American Samoa,Pago Pago
4,203,AND,Andorra,Andorra la Vella
5,42,AGO,Angola,Luanda
6,224,AIA,Anguilla,The Valley
7,201,ATG,Antigua and Barbuda,Saint Johnâ€™s
8,33,ARG,Argentina,Buenos Aires
9,140,ARM,Armenia,Yerevan


In [93]:
df2.tail(10)

Unnamed: 0,Rank,CCA3,Country,Capital
224,43,UZB,Uzbekistan,Tashkent
225,181,VUT,Vanuatu,Port-Vila
226,234,VAT,Vatican City,Vatican City
227,51,VEN,Venezuela,Caracas
228,16,VNM,Vietnam,Hanoi
229,226,WLF,Wallis and Futuna,Mata-Utu
230,172,ESH,Western Sahara,El AaiÃºn
231,46,YEM,Yemen,Sanaa
232,63,ZMB,Zambia,Lusaka
233,74,ZWE,Zimbabwe,Harare


In [94]:
df2['Rank']

0       36
1      138
2       34
3      213
4      203
5       42
6      224
7      201
8       33
9      140
10     198
11      55
12      99
13      91
14     176
15     154
16       8
17     186
18      96
19      81
20     177
21      77
22     206
23     165
24      80
25     137
26     144
27       7
28     221
29     175
30     108
31      58
32      78
33      73
34      53
35      39
36     171
37     205
38     117
39      69
40      65
41       1
42      28
43     163
44     223
45     124
46     130
47      85
48     189
49     158
50      88
51     115
52     160
53     204
54      84
55      15
56      67
57      14
58     112
59     152
60     132
61     156
62     159
63      12
64     231
65     209
66     162
67     118
68      23
69     184
70     183
71     146
72     142
73     131
74      19
75      47
76     219
77      90
78     208
79     193
80     178
81     191
82      68
83     207
84      75
85     149
86     164
87      82
88      89
89     104
90      94

In [97]:
df2.loc[224]

Rank               43
CCA3              UZB
Country    Uzbekistan
Capital      Tashkent
Name: 224, dtype: object

In [96]:
df2.iloc[224]

Rank               43
CCA3              UZB
Country    Uzbekistan
Capital      Tashkent
Name: 224, dtype: object