# Reading Data from files
###### **19.01.2022**

### Reading data from csv files

Let’s start by creating a toy example of a CSV file. You can do this straight in the JupyterLab by opening up a new text file. Enter the following text. Let’s now import our csv file as a pandas DataFrame. To do this, we will use the function read_csv() from pandas which takes data from a csv file and converts and stores it as a DataFrame. Since we saved the file directly in our working directory, we can pass the file name directly.

In [21]:
import numpy as np
import pandas as pd
df = pd.read_csv("data/c2_file.csv")
df

Unnamed: 0,a,b,c,d
0,yellow,10,2,3.2
1,green,2,3,8.1
2,blue,7,1,0.4


In [22]:
# If our file did not contain any headers, and the first row was part of the data values
pd.read_csv("data/c2_file.csv", header=None)

Unnamed: 0,0,1,2,3
0,a,b,c,d
1,yellow,10,2,3.2
2,green,2,3,8.1
3,blue,7,1,0.4


In [23]:
# We can also specify the column labels ourselves as follows
pd.read_csv("data/c2_file.csv", names=["column 1", "column 2", "column 3", "column 4"])

Unnamed: 0,column 1,column 2,column 3,column 4
0,a,b,c,d
1,yellow,10,2,3.2
2,green,2,3,8.1
3,blue,7,1,0.4


In [24]:
# We can specify a specific column to be the index by using the argument index_col and choosing the position of the column we want as an index.
pd.read_csv("data/c2_file.csv", index_col=0)

Unnamed: 0_level_0,b,c,d
a,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
yellow,10,2,3.2
green,2,3,8.1
blue,7,1,0.4


In [25]:
# We can assign each column the data type which is the most general among the values encountered in that column.
df.dtypes

a     object
b      int64
c      int64
d    float64
dtype: object

In [26]:
# We can also force the data types of the columns to whatever we like when creating the DataFrame
df2 = pd.read_csv("data/c2_file.csv", dtype={"b": np.float64}) #the type must be coherent with the information contained in the columns
df2.dtypes

a     object
b    float64
c      int64
d    float64
dtype: object

In [27]:
# Loading partial data
pd.read_csv("data/c2_file.csv", usecols=["a", "b"])

Unnamed: 0,a,b
0,yellow,10
1,green,2
2,blue,7


### Reading data from Excel files

We can import data from an Excel file using the pandas function pd.read_excel(). Let’s try this out:

In [17]:
# We install the package to read excel files
!pip install xlrd



In [28]:
import pandas as pd

pd.read_excel("data/c2_data.xls")

Unnamed: 0,varA,varB,varC
0,0.391723,-0.155122,0.381104
1,0.575125,-0.105817,0.232245
2,0.672305,0.424688,-0.694795
3,0.766115,-0.79135,-0.028739
4,0.677259,-0.817543,-0.537088
5,-0.029702,-0.891848,-0.682719
6,-0.161366,-0.6596,-0.727898
7,0.031672,0.016607,-0.940479
8,0.833212,-0.503236,-0.88721
9,0.907753,0.265177,-0.390762


In [29]:
#  If we want to load data from a different spreadsheet, we must specify its name or index.
pd.read_excel("data/c2_data.xls", sheet_name="Sheet2")

Unnamed: 0,varD,varE,varF
0,0.907753,0.265177,-0.390762
1,0.755019,-0.768056,-0.528307
2,0.850692,-0.537159,-0.601387
3,0.131663,0.941327,0.240073
4,0.5744,0.091735,-0.395277
5,0.81663,0.875612,-0.880044
6,0.536732,0.175428,-0.473053
7,-0.084641,-0.042827,0.053344
8,0.268271,-0.010628,-0.090952
9,0.166792,-0.872579,-0.556899
