# CSV Files 


**Questions**:
- "How do I open a csv file and read its contents?"
- "How do I write a csv file with the variables I generated?"

**Learning Objectives**:
- "Understand how to read/write csv files."
* * * * *


# Reading/Writing csv files using `pandas`

Reading in a dataset that is stored as a "comma separated file" (csv) is easy in Python using the `pandas` package. Central to the `pandas` package is the `DataFrame` type, which stores 2-dimensional tabular data in a format similar to Excel spreedsheets.

Let's import `pandas` and use it's `read_csv()` function to load the data stored in a csv file into a `DataFrame`

In [None]:
# You might need to install the pandas library first. 
# Unhashtag the line below and run this cell to install it:
# !pip install pandas

In [None]:
import pandas as pd
caps = pd.read_csv('capitals.csv')

We can look at the first 5 (or any number) rows of data using the `.head()` method of the `DataFrame` object.

In [None]:
caps.head()

To see how many data points and variables exist in the dataframe we can simply use the `.shape` attribute.

In [None]:
caps.shape

Or we can get more detailed information about the number of entries (e.g. observations, data points) and the variables for each entry using the `.info()` method.

In [None]:
caps.info()

It looks like there is a single missing value in the Capital variable (there are 199 non-null objects, not 200). Let's remove that missing value (or `na`) using the `dropna()` method so that we can save an updated version of the csv file.

In [None]:
caps_nomissing = caps.dropna()
caps_nomissing.info()

That looks better. Now let's write this updated `DataFrame` out to a csv file.

In [None]:
caps_nomissing.to_csv('capitals_nomissing.csv')

For more information on using `pandas` come to the D-Lab's workshop titled "Introduction to Pandas". Here's a [link](https://github.com/dlab-berkeley/introduction-to-pandas) to the GitHub repo containing the course materials.



## Challenge 1: Writing a CSV file

Below is a `pandas` `DataFrame` created from a dictionary of lists representing various information about US states. Write this [object](https://github.com/dlab-berkeley/python-intensive/blob/master/Glossary.md#object) as a CSV file called `states.csv`

In [None]:
import pandas as pd
import numpy as np
states = pd.DataFrame( {'state': ['Ohio', 'Michigan', 'California', 'Florida', 'Alabama'],
                        'population': [11.6, 9.9, 39.1, 20.2, 4.9], 
                        'year in union': [1803, 1837, 1850, 1834, 1819], 
                        'state bird': ['Northern cardinal', np.nan, np.nan, np.nan, np.nan], 
                        'capital': ['Columbus', 'Lansing', 'Sacramento', 'Tallahassee', 'Montgomery']})
states

## Challenge 2.
Replace nan values in 'state bird' object with the values: ['Southern cardinal', 'Eastern cardinal', 'Western cardinal', 'Center cardinal']. Rewrite `states.csv` with the new dataframe.

## Challenge 3.
Delete column 'year in union' using 'drop' method and print the dataframe statistics using '.describe()' method.

Source: https://github.com/dlab-berkeley/Python-Fundamentals