Reading a CSV file using Pandas

In [1]:
from urllib.request import urlretrieve

In [2]:
italy_covid_url = 'https://gist.githubusercontent.com/aakashns/f6a004fa20c84fec53262f9a8bfee775/raw/f309558b1cf5103424cef58e2ecb8704dcd4d74c/italy-covid-daywise.csv'

urlretrieve(italy_covid_url, 'italy-covid-daywise.csv')

('italy-covid-daywise.csv', <http.client.HTTPMessage at 0x1a5b0aeb4d0>)

To read the file, we can use the "read_csv" method from Pandas

In [3]:
import pandas as pd

In [5]:
covid_df = pd.read_csv('italy-covid-daywise.csv')

In [6]:
type(covid_df)

pandas.core.frame.DataFrame

In [7]:
covid_df

Unnamed: 0,date,new_cases,new_deaths,new_tests
0,2019-12-31,0.0,0.0,
1,2020-01-01,0.0,0.0,
2,2020-01-02,0.0,0.0,
3,2020-01-03,0.0,0.0,
4,2020-01-04,0.0,0.0,
...,...,...,...,...
243,2020-08-30,1444.0,1.0,53541.0
244,2020-08-31,1365.0,4.0,42583.0
245,2020-09-01,996.0,6.0,54395.0
246,2020-09-02,975.0,8.0,


To view basic information about the data frame, we use the ".info" method

In [8]:
covid_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 248 entries, 0 to 247
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   date        248 non-null    object 
 1   new_cases   248 non-null    float64
 2   new_deaths  248 non-null    float64
 3   new_tests   135 non-null    float64
dtypes: float64(3), object(1)
memory usage: 7.9+ KB


Use the ".describe" method to get some statistical information

In [9]:
covid_df.describe()

Unnamed: 0,new_cases,new_deaths,new_tests
count,248.0,248.0,135.0
mean,1094.818548,143.133065,31699.674074
std,1554.508002,227.105538,11622.209757
min,-148.0,-31.0,7841.0
25%,123.0,3.0,25259.0
50%,342.0,17.0,29545.0
75%,1371.75,175.25,37711.0
max,6557.0,971.0,95273.0


The "columns" property contains list of columns in the dataframe

In [10]:
covid_df.columns

Index(['date', 'new_cases', 'new_deaths', 'new_tests'], dtype='object')

The "columns" property contains list of columns in the dataframe

In [11]:
covid_df.shape

(248, 4)

Retrieving Data From A Data Frame

In [12]:
covid_df['new_cases']

0         0.0
1         0.0
2         0.0
3         0.0
4         0.0
        ...  
243    1444.0
244    1365.0
245     996.0
246     975.0
247    1326.0
Name: new_cases, Length: 248, dtype: float64

In [13]:
type(covid_df['new_cases'])

pandas.core.series.Series

In [14]:
covid_df['new_cases'][200]

np.float64(231.0)

In [15]:
covid_df['new_cases'][123]

np.float64(1965.0)

Use the ".at" method to retrive a particular row or column

In [18]:
covid_df.at[134, 'new_cases']

np.float64(1402.0)

In [19]:
covid_df.at[134, 'new_deaths']

np.float64(172.0)

In [24]:
cases_df = covid_df[['date', 'new_cases']]

In [25]:
cases_df

Unnamed: 0,date,new_cases
0,2019-12-31,0.0
1,2020-01-01,0.0
2,2020-01-02,0.0
3,2020-01-03,0.0
4,2020-01-04,0.0
...,...,...
243,2020-08-30,1444.0
244,2020-08-31,1365.0
245,2020-09-01,996.0
246,2020-09-02,975.0


In [28]:
deaths_df = covid_df[['date', 'new_deaths']]
deaths_df

Unnamed: 0,date,new_deaths
0,2019-12-31,0.0
1,2020-01-01,0.0
2,2020-01-02,0.0
3,2020-01-03,0.0
4,2020-01-04,0.0
...,...,...
243,2020-08-30,1.0
244,2020-08-31,4.0
245,2020-09-01,6.0
246,2020-09-02,8.0


To make a copy of a data frame

In [30]:
covid_df_copy = covid_df.copy()
covid_df_copy

Unnamed: 0,date,new_cases,new_deaths,new_tests
0,2019-12-31,0.0,0.0,
1,2020-01-01,0.0,0.0,
2,2020-01-02,0.0,0.0,
3,2020-01-03,0.0,0.0,
4,2020-01-04,0.0,0.0,
...,...,...,...,...
243,2020-08-30,1444.0,1.0,53541.0
244,2020-08-31,1365.0,4.0,42583.0
245,2020-09-01,996.0,6.0,54395.0
246,2020-09-02,975.0,8.0,


To access a Specific row of data

In [31]:
covid_df.loc[234]

date          2020-08-21
new_cases          840.0
new_deaths           6.0
new_tests        44943.0
Name: 234, dtype: object

In [35]:
covid_df.head(34)

Unnamed: 0,date,new_cases,new_deaths,new_tests
0,2019-12-31,0.0,0.0,
1,2020-01-01,0.0,0.0,
2,2020-01-02,0.0,0.0,
3,2020-01-03,0.0,0.0,
4,2020-01-04,0.0,0.0,
5,2020-01-05,0.0,0.0,
6,2020-01-06,0.0,0.0,
7,2020-01-07,0.0,0.0,
8,2020-01-08,0.0,0.0,
9,2020-01-09,0.0,0.0,


In [36]:
covid_df.tail(90)

Unnamed: 0,date,new_cases,new_deaths,new_tests
158,2020-06-06,518.0,85.0,34036.0
159,2020-06-07,270.0,72.0,27894.0
160,2020-06-08,197.0,53.0,16301.0
161,2020-06-09,280.0,65.0,32200.0
162,2020-06-10,283.0,79.0,37865.0
...,...,...,...,...
243,2020-08-30,1444.0,1.0,53541.0
244,2020-08-31,1365.0,4.0,42583.0
245,2020-09-01,996.0,6.0,54395.0
246,2020-09-02,975.0,8.0,


In [None]:
covid_df.at[0, 'new']