# Exercise 6: Reading Tabular Data into DataFrames

## Aim: Learn what DataFrames are and practice using them.

### Issues covered:
- Importing the `pandas` library
- Using `pandas` to load a simple CSV data set
- Get information about the DataFrames we make

## 1. Let's import `pandas` and make some DataFrames.

Import `pandas`, then create a dataframe using the `data/weather.csv` file and print it out.

In [1]:
import pandas as pd
weather_df = pd.read_csv("../data/weather.csv")
print(weather_df)

         Date   Time  Temp  Rainfall
0  2014-01-01  00:00  2.34      4.45
1  2014-01-01  12:00  6.70      8.34
2  2014-01-02  00:00 -1.34     10.25


Create a new dataframe which indexes by `Date` and print it.

In [2]:
date_weather_df = pd.read_csv("../data/weather.csv", index_col="Date")
print(date_weather_df)

             Time  Temp  Rainfall
Date                             
2014-01-01  00:00  2.34      4.45
2014-01-01  12:00  6.70      8.34
2014-01-02  00:00 -1.34     10.25


## 2. Let's practice using some dataframe methods.

What is the memory usage of the dataframe in bytes?

In [3]:
print('data frame of weather.csv, below:\n')
weather_df.info()
print('\n'+'data frame with data as index, below:\n')
date_weather_df.info()

data frame of weather.csv, below:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Date      3 non-null      object 
 1   Time      3 non-null      object 
 2   Temp      3 non-null      float64
 3   Rainfall  3 non-null      float64
dtypes: float64(2), object(2)
memory usage: 228.0+ bytes

data frame with data as index, below:

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 2014-01-01 to 2014-01-02
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Time      3 non-null      object 
 1   Temp      3 non-null      float64
 2   Rainfall  3 non-null      float64
dtypes: float64(2), object(1)
memory usage: 96.0+ bytes


What command can you use to find the dataframe's column names?

In [11]:
weather_df_column = weather_df.columns
print(weather_df_column+'\n')

Index(['Date\n', 'Time\n', 'Temp\n', 'Rainfall\n'], dtype='object')


Swap the rows and columns and `print` the result.

In [19]:
tdw_df = date_weather_df.transpose()
print(tdw_df)

Date     2014-01-01 2014-01-01 2014-01-02
Time          00:00      12:00      00:00
Temp           2.34        6.7      -1.34
Rainfall       4.45       8.34      10.25


Find the mean and standard deviation of the weather data.

In [38]:
print(date_weather_df)
tempa = date_weather_df[["Temp", "Rainfall"]]
print(tempa)
print(tempa.std())

             Time  Temp  Rainfall
Date                             
2014-01-01  00:00  2.34      4.45
2014-01-01  12:00  6.70      8.34
2014-01-02  00:00 -1.34     10.25
            Temp  Rainfall
Date                      
2014-01-01  2.34      4.45
2014-01-01  6.70      8.34
2014-01-02 -1.34     10.25
Temp        4.024790
Rainfall    2.955791
dtype: float64


## 3. Extension: Some Dataframe Challenges

1. Find the first three rows of data in `data/americas_gdp.csv` using `head()`.

In [41]:
import pandas as pd
usa = pd.read_csv('../data/americas_gdp.csv')
usa.head(3)

Unnamed: 0,continent,country,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
0,Americas,Argentina,5911.315053,6856.856212,7133.166023,8052.953021,9443.038526,10079.02674,8997.897412,9139.671389,9308.41871,10967.28195,8797.640716,12779.37964
1,Americas,Bolivia,2677.326347,2127.686326,2180.972546,2586.886053,2980.331339,3548.097832,3156.510452,2753.69149,2961.699694,3326.143191,3413.26269,3822.137084
2,Americas,Brazil,2108.944355,2487.365989,3336.585802,3429.864357,4985.711467,6660.118654,7030.835878,7807.095818,6950.283021,7957.980824,8131.212843,9065.800825


2. Find the last 3 **columns** of data.

_Hint: You may need to change your view of the data then you can use `tail()`._

In [43]:
usa.tail(3)

Unnamed: 0,continent,country,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
22,Americas,United States,13990.48208,14847.12712,16173.14586,19530.36557,21806.03594,24072.63213,25009.55914,29884.35041,32003.93224,35767.43303,39097.09955,42951.65309
23,Americas,Uruguay,5716.766744,6150.772969,5603.357717,5444.61962,5703.408898,6504.339663,6920.223051,7452.398969,8137.004775,9230.240708,7727.002004,10611.46299
24,Americas,Venezuela,7689.799761,9802.466526,8422.974165,9541.474188,10505.25966,13143.95095,11152.41011,9883.584648,10733.92631,10165.49518,8605.047831,11415.80569


3. Use `help(data_americas.to_csv)` to figure out how writing to a CSV file works.

In [1]:
#help(usa.to_csv)

4. Try writing to a CSV file using the code below (giving your own filename). Take a look in the data folder and check it's there.
```
data_americas.to_csv('data/new_file_name.csv')
```

In [47]:
usa.to_csv("../data/new_file_name.csv")