# Reading Tabular Data into Data Frames: Solutions

## Reading Other Data

Read the data in `gapminder_gdp_americas.csv` (which should be in the same directory as `gapminder_gdp_oceania.csv`) into a variable called `americas` and display its summary statistics.

To read in a CSV, we use `pd.read_csv` and pass the filename `'../../data/gapminder_gdp_americas.csv'` to it. We also once again pass the column name `'country'` to the parameter `index_col` in order to index by country. The summary statistics can be displayed with the `DataFrame.describe()` method.

In [2]:
import pandas as pd

americas = pd.read_csv('../../data/gapminder_gdp_americas.csv', index_col='country')
americas.describe()

Unnamed: 0,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
count,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0,25.0
mean,4079.062552,4616.043733,4901.54187,5668.253496,6491.334139,7352.007126,7506.737088,7793.400261,8044.934406,8889.300863,9287.677107,11003.031625
std,3001.727522,3312.381083,3421.740569,4160.88556,4754.404329,5355.602518,5530.490471,6665.039509,7047.089191,7874.225145,8895.817785,9713.209302
min,1397.717137,1544.402995,1662.137359,1452.057666,1654.456946,1874.298931,2011.159549,1823.015995,1456.309517,1341.726931,1270.364932,1201.637154
25%,2428.237769,2487.365989,2750.364446,3242.531147,4031.408271,4756.763836,4258.503604,4140.442097,4439.45084,4684.313807,4858.347495,5728.353514
50%,3048.3029,3780.546651,4086.114078,4643.393534,5305.445256,6281.290855,6434.501797,6360.943444,6618.74305,7113.692252,6994.774861,8948.102923
75%,3939.978789,4756.525781,5180.75591,5788.09333,6809.40669,7674.929108,8997.897412,7807.095818,8137.004775,9767.29753,8797.640716,11977.57496
max,13990.48208,14847.12712,16173.14586,19530.36557,21806.03594,24072.63213,25009.55914,29884.35041,32003.93224,35767.43303,39097.09955,42951.65309


## Inspecting Data

After reading the data for the Americas, use `help(americas.head)` and `help(americas.tail)` to find out what `DataFrame.head` and `DataFrame.tail` do.

1. What method call will display the first three rows of this data?
1. What method call will display the last three columns of this data? (Hint: you may need to change your view of the data.)

1. We can check out the first five rows of `americas` by executing `americas.head()` (allowing us to view the head of the DataFrame). We can specify the number of rows we wish to see by specifying the parameter `n` in our call to `americas.head()`. To view the first three rows, execute:

In [3]:
americas.head(n=3)

Unnamed: 0_level_0,continent,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Argentina,Americas,5911.315053,6856.856212,7133.166023,8052.953021,9443.038526,10079.02674,8997.897412,9139.671389,9308.41871,10967.28195,8797.640716,12779.37964
Bolivia,Americas,2677.326347,2127.686326,2180.972546,2586.886053,2980.331339,3548.097832,3156.510452,2753.69149,2961.699694,3326.143191,3413.26269,3822.137084
Brazil,Americas,2108.944355,2487.365989,3336.585802,3429.864357,4985.711467,6660.118654,7030.835878,7807.095818,6950.283021,7957.980824,8131.212843,9065.800825


2. To check out the last three rows of `americas`, we would use the command, `americas.tail(n=3)`, analogous to `head()` used above. However, here we want to look at the last three columns so we need to change our view and then use `tail()`. To do so, we create a new DataFrame in which rows and columns are switched:

In [4]:
americas_flipped = americas.T

We can then view the last three columns of `americas` by viewing the last three rows of `americas_flipped`:

In [5]:
americas_flipped.tail(n=3)

country,Argentina,Bolivia,Brazil,Canada,Chile,Colombia,Costa Rica,Cuba,Dominican Republic,Ecuador,...,Mexico,Nicaragua,Panama,Paraguay,Peru,Puerto Rico,Trinidad and Tobago,United States,Uruguay,Venezuela
gdpPercap_1997,10967.28195,3326.143191,7957.980824,28954.92589,10118.05318,6117.361746,6677.045314,5431.990415,3614.101285,7429.455877,...,9767.29753,2253.023004,7113.692252,4247.400261,5838.347657,16999.4333,8792.573126,35767.43303,9230.240708,10165.49518
gdpPercap_2002,8797.640716,3413.26269,8131.212843,33328.96507,10778.78385,5755.259962,7723.447195,6340.646683,4563.808154,5773.044512,...,10742.44053,2474.548819,7356.031934,3783.674243,5909.020073,18855.60618,11460.60023,39097.09955,7727.002004,8605.047831
gdpPercap_2007,12779.37964,3822.137084,9065.800825,36319.23501,13171.63885,7006.580419,9645.06142,8948.102923,6025.374752,6873.262326,...,11977.57496,2749.320965,9809.185636,4172.838464,7408.905561,19328.70901,18008.50924,42951.65309,10611.46299,11415.80569


This shows the data that we want, but we may prefer to display three columns instead of three rows, so we can flip it back:

In [6]:
americas_flipped.tail(n=3).T    

Unnamed: 0_level_0,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Argentina,10967.28195,8797.640716,12779.37964
Bolivia,3326.143191,3413.26269,3822.137084
Brazil,7957.980824,8131.212843,9065.800825
Canada,28954.92589,33328.96507,36319.23501
Chile,10118.05318,10778.78385,13171.63885
Colombia,6117.361746,5755.259962,7006.580419
Costa Rica,6677.045314,7723.447195,9645.06142
Cuba,5431.990415,6340.646683,8948.102923
Dominican Republic,3614.101285,4563.808154,6025.374752
Ecuador,7429.455877,5773.044512,6873.262326


**Note:** we could have done the above in a single line of code by ‘chaining’ the commands:

In [7]:
americas.T.tail(n=3).T

Unnamed: 0_level_0,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Argentina,10967.28195,8797.640716,12779.37964
Bolivia,3326.143191,3413.26269,3822.137084
Brazil,7957.980824,8131.212843,9065.800825
Canada,28954.92589,33328.96507,36319.23501
Chile,10118.05318,10778.78385,13171.63885
Colombia,6117.361746,5755.259962,7006.580419
Costa Rica,6677.045314,7723.447195,9645.06142
Cuba,5431.990415,6340.646683,8948.102923
Dominican Republic,3614.101285,4563.808154,6025.374752
Ecuador,7429.455877,5773.044512,6873.262326


## Reading Files in Other Directories

The data for your current project is stored in a file called `microbes.csv`, which is located in a folder called `field_data`. You are doing analysis in a notebook called `analysis.ipynb` in a sibling folder called `thesis`:

```
your_home_directory
+-- field_data/
|   +-- microbes.csv
+-- thesis/
    +-- analysis.ipynb
```

What value(s) should you pass to `read_csv` to read `microbes.csv` in `analysis.ipynb`?

We need to specify the path to the file of interest in the call to `pd.read_csv`. We first need to ‘jump’ out of the folder `thesis` using `‘../’` and then into the folder `field_data` using `‘field_data/’`. Then we can specify the filename `microbes.csv`. The result is as follows:

```python
data_microbes = pd.read_csv('../field_data/microbes.csv')
```

## Writing Data

As well as the `read_csv` function for reading data from a file, Pandas provides a `to_csv` function to write dataframes to files. Applying what you’ve learned about reading from files, write one of your dataframes to a file called `processed.csv`. You can use `help` to get information on how to use `to_csv`.

In order to write the DataFrame `americas` to a file called `processed.csv`, execute the following command:


In [8]:
americas.to_csv('processed.csv')

For help on `to_csv`, you could execute, for example:



In [9]:
help(americas.to_csv)


Help on method to_csv in module pandas.core.generic:

to_csv(path_or_buf: 'FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None' = None, sep: 'str' = ',', na_rep: 'str' = '', float_format: 'str | Callable | None' = None, columns: 'Sequence[Hashable] | None' = None, header: 'bool_t | list[str]' = True, index: 'bool_t' = True, index_label: 'IndexLabel | None' = None, mode: 'str' = 'w', encoding: 'str | None' = None, compression: 'CompressionOptions' = 'infer', quoting: 'int | None' = None, quotechar: 'str' = '"', lineterminator: 'str | None' = None, chunksize: 'int | None' = None, date_format: 'str | None' = None, doublequote: 'bool_t' = True, escapechar: 'str | None' = None, decimal: 'str' = '.', errors: 'str' = 'strict', storage_options: 'StorageOptions' = None) -> 'str | None' method of pandas.core.frame.DataFrame instance
    Write object to a comma-separated values (csv) file.
    
    Parameters
    ----------
    path_or_buf : str, path object, file-like object, or None, defaul

Note that `help(to_csv)` throws an error! This is a subtlety and is due to the fact that `to_csv` is NOT a function in and of itself and the actual call is `americas.to_csv`.



Licensed under [CC-BY 4.0](http://swcarpentry.github.io/python-novice-gapminder/07-reading-tabular/index.html) 2018–2023 by [The Carpentries](https://carpentries.org/)

Licensed under [CC-BY 4.0](http://swcarpentry.github.io/python-novice-gapminder/07-reading-tabular/index.html) 2016–2018 by [Software Carpentry Foundation](https://software-carpentry.org/)