# **Pandas File Reading**

In this part we are going to learn about

1. StringIO
2. Pandas read_csv
3. Pamdas to_csv

## **StringIO**

It basically convert a string data into in memory fileformat.

In [5]:
# importing StringIO
from io import StringIO

In [11]:
# creating a string data
data = ('Col1,Col2,Col3\n'
        'a,b,1\n'
        'c,d,2\n'
        'e,f,3')
print(type(data))

<class 'str'>


In [12]:
# in memory file format object
StringIO(data)

<_io.StringIO at 0x7d03540bde10>

In [13]:
# creating a dataframe by reading in memory file format object
pd.read_csv(StringIO(data))

Unnamed: 0,Col1,Col2,Col3
0,a,b,1
1,c,d,2
2,e,f,3


## **Pandas read_csv**

Pandas has an inbuilt function,i.e., read_csv() which help us to read a comma separated values(CSV) files and convert it into a pandas DataFrame.

In [1]:
# importing pandas
import pandas as pd

**Reading a csv file**

In [3]:
# reading a csv file using read_csv
df = pd.read_csv('/content/mercedesbenz.csv')
df.head()

Unnamed: 0,ID,y,X0,X1,X2,X3,X4,X5,X6,X8,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,0,130.81,k,v,at,a,d,u,j,o,...,0,0,1,0,0,0,0,0,0,0
1,6,88.53,k,t,av,e,d,y,l,o,...,1,0,0,0,0,0,0,0,0,0
2,7,76.26,az,w,n,c,d,x,j,x,...,0,0,0,0,0,0,1,0,0,0
3,9,80.62,az,t,n,f,d,x,l,e,...,0,0,0,0,0,0,0,0,0,0
4,13,78.02,az,v,n,f,d,h,d,n,...,0,0,0,0,0,0,0,0,0,0


**Reading a csv file with specified columns**

In [14]:
# reading a csv file with specified columns
df = pd.read_csv(filepath_or_buffer='/content/mercedesbenz.csv', usecols=['X0', 'X1', 'X2', 'X3', 'X4', 'X5'])
df.head()

Unnamed: 0,X0,X1,X2,X3,X4,X5
0,k,v,at,a,d,u
1,k,t,av,e,d,y
2,az,w,n,c,d,x
3,az,t,n,f,d,x
4,az,v,n,f,d,h


**Reading file with specified data type of columns**

In [21]:
# creating a string data
data = ('Col1,Col2,Col3\n'
        '1,2,10\n'
        '3,4,20\n'
        '5,6,30')

In [22]:
# reading data
df = pd.read_csv(StringIO(data))
df.head()

Unnamed: 0,Col1,Col2,Col3
0,1,2,10
1,3,4,20
2,5,6,30


In [23]:
# info of dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Col1    3 non-null      int64
 1   Col2    3 non-null      int64
 2   Col3    3 non-null      int64
dtypes: int64(3)
memory usage: 200.0 bytes


In [26]:
# reading the string data with specified datatype
df = pd.read_csv(StringIO(data), dtype='float64')
df.head()

Unnamed: 0,Col1,Col2,Col3
0,1.0,2.0,10.0
1,3.0,4.0,20.0
2,5.0,6.0,30.0


In [27]:
# dataframe info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Col1    3 non-null      float64
 1   Col2    3 non-null      float64
 2   Col3    3 non-null      float64
dtypes: float64(3)
memory usage: 200.0 bytes


In [29]:
# reading the string data with specified datatype
df = pd.read_csv(StringIO(data), dtype={'Col1': 'int32', 'Col2': 'float64', 'Col3': 'object'})
df.head()

Unnamed: 0,Col1,Col2,Col3
0,1,2.0,10
1,3,4.0,20
2,5,6.0,30


In [30]:
# dataframe info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Col1    3 non-null      int32  
 1   Col2    3 non-null      float64
 2   Col3    3 non-null      object 
dtypes: float64(1), int32(1), object(1)
memory usage: 188.0+ bytes


**Reading file with specified index**

In [31]:
# creating a string data
data = ('index,Col1,Col2,Col3\n'
        '4,apple,bat,5.7\n'
        '8,orange,cow,10')

In [32]:
# reading file without specified index
pd.read_csv(StringIO(data))

Unnamed: 0,index,Col1,Col2,Col3
0,4,apple,bat,5.7
1,8,orange,cow,10.0


In [34]:
# reading file with specified index
df = pd.read_csv(StringIO(data), index_col=0)
df

Unnamed: 0_level_0,Col1,Col2,Col3
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
4,apple,bat,5.7
8,orange,cow,10.0


**Reading file with specified index and columns**

In [36]:
# creating a string data
data = ('a,Col1,Col2,Col3\n'
        '4,apple,bat,5.7\n'
        '8,orange,cow,10')

In [41]:
# reading file with specified index and columns
df = pd.read_csv(StringIO(data), index_col='a', usecols=['a', 'Col1', 'Col3'])
df

Unnamed: 0_level_0,Col1,Col3
a,Unnamed: 1_level_1,Unnamed: 2_level_1
4,apple,5.7
8,orange,10.0


**Reading a TAB seperated file**

In [44]:
# creating a string data
data = ('a  Col1  Col2  Col3\n'
        '4  apple  bat  5.7\n'
        '8  orange  cow 10')

In [50]:
# reading a tab separated file
df = pd.read_csv(StringIO(data), sep='\t')
df

Unnamed: 0,a Col1 Col2 Col3
0,4 apple bat 5.7
1,8 orange cow 10


## **Pandas to_csv**

It basically convert a dataframe into a specified format file and save that file into the working directory.

In [15]:
# reading a csv file
df = pd.read_csv(filepath_or_buffer='/content/mercedesbenz.csv', usecols=['X0', 'X1', 'X2', 'X3', 'X4', 'X5'])
df.head()

Unnamed: 0,X0,X1,X2,X3,X4,X5
0,k,v,at,a,d,u
1,k,t,av,e,d,y
2,az,w,n,c,d,x
3,az,t,n,f,d,x
4,az,v,n,f,d,h


In [17]:
# converting the above dataframe into a csv file
df.to_csv('test.csv', index=False)