# Introduction
Welcome to the **[Learn Pandas](https://www.kaggle.com/learn/pandas)** track. These hands-on exercises are targeted for someone who has worked with Pandas a little before. 
Each page has a list of `relevant resources` you can use if you get stumped. The top item in each list has been custom-made to help you with the exercises on that page.

The first step in most data analytics projects is reading the data file. In this section, you'll create `Series` and `DataFrame` objects, both by hand and by reading data files.

# Relevant Resources
* ** [Creating, Reading and Writing Reference](https://www.kaggle.com/residentmario/creating-reading-and-writing-reference)**
* [General Pandas Cheat Sheet](https://assets.datacamp.com/blog_assets/PandasPythonForDataScience.pdf)

# Set Up

Run the code cell below to load libraries you will need (including coad to check your answers).

In [1]:
import pandas as pd
pd.set_option('max_rows', 5)
from learntools.core import binder; binder.bind(globals())
from learntools.pandas.creating_reading_and_writing import *
print("Setup complete.")

Setup complete.


# Checking Answers

**TODO: rewrite me (here and in other workbooks)**

You can check your answers in each of the exercises that follow using the  `check_qN` function provided in the code cell above (replacing `N` with the number of the exercise). For example here's how you would check an incorrect answer to exercise 1:

In [2]:
#check_q1(pd.DataFrame())

For the questions that follow, if you use `check_qN` on your answer, and your answer is right, a simple `True` value will be returned.

If you get stuck, you may run the `print(answer_qN())` function to print the answer outright.

# Exercises

## 1.

In the cell below, create a DataFrame `fruits` that looks like this:

![](https://i.imgur.com/Ax3pp2A.png)

In [3]:
# Your code goes here. Create a dataframe matching the above diagram and assign it to the variable fruits.
fruits = None

q1.check()
fruits

<IPython.core.display.Javascript object>

<span style="color:#ccaa33">Check:</span> When you've updated the starter code, `check()` will tell you whether your code is correct. You need to update the code that creates variable `fruits`

In [4]:
#%%RM_IF(PROD)%%
# Correct solution
dat = [[30, 21]]
cols = ['Apples', 'Bananas']
fruits = pd.DataFrame(dat, columns=cols)

q1.check()
fruits

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

Unnamed: 0,Apples,Bananas
0,30,21


In [5]:
#%%RM_IF(PROD)%%
# Correct solution 2
fruits = pd.DataFrame({'Apples': [30], 'Bananas': [21]})

q1.check()
fruits

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

Unnamed: 0,Apples,Bananas
0,30,21


In [6]:
#%%RM_IF(PROD)%%
# Incorrect (wrong dtype)
fruits = pd.DataFrame({'Apples': [30.], 'Bananas': [21.]})

q1.check()
fruits

<IPython.core.display.Javascript object>

<span style="color:#cc3333">Incorrect:</span> Incorrect value for dataframe `fruits`

Unnamed: 0,Apples,Bananas
0,30.0,21.0


In [7]:
#%%RM_IF(PROD)%%
# Incorrect (wrong column name)
fruits = pd.DataFrame({'Apples': [30], 'bananas': [21]})

q1.check()
fruits

<IPython.core.display.Javascript object>

<span style="color:#cc3333">Incorrect:</span> Expected dataframe `fruits` to have column `Bananas`

Unnamed: 0,Apples,bananas
0,30,21


In [8]:
# Uncomment the line below to see a solution
#_COMMENT_IF(PROD)_
q1.solution()

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
fruits = pd.DataFrame([[30, 21]], columns=['Apples', 'Bananas'])
```

## 2.

Create a dataframe `fruit_sales` that matches the diagram below:

![](https://i.imgur.com/CHPn7ZF.png)

In [9]:
# Your code goes here. Create a dataframe matching the above diagram and assign it to the variable fruit_sales.
fruit_sales = None

q2.check()
fruit_sales

<IPython.core.display.Javascript object>

<span style="color:#ccaa33">Check:</span> When you've updated the starter code, `check()` will tell you whether your code is correct. You need to update the code that creates variable `fruit_sales`

In [10]:
#%%RM_IF(PROD)%%
# Correct solution (canonical)
fruit_sales = pd.DataFrame([[35, 21], [41, 34]], columns=['Apples', 'Bananas'],
                index=['2017 Sales', '2018 Sales'])

q2.check()
fruit_sales

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

Unnamed: 0,Apples,Bananas
2017 Sales,35,21
2018 Sales,41,34


In [11]:
#%%RM_IF(PROD)%%
# Incorrect solution (wrong order of values)
fruit_sales = pd.DataFrame([[35, 21], [41, 34]][::-1], columns=['Apples', 'Bananas'],
                index=['2017 Sales', '2018 Sales'])

q2.check()
fruit_sales

<IPython.core.display.Javascript object>

<span style="color:#cc3333">Incorrect:</span> Incorrect value for dataframe `fruit_sales`

Unnamed: 0,Apples,Bananas
2017 Sales,41,34
2018 Sales,35,21


In [12]:
#_COMMENT_IF(PROD)_
q2.solution()

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
fruit_sales = pd.DataFrame([[35, 21], [41, 34]], columns=['Apples', 'Bananas'],
                index=['2017 Sales', '2018 Sales'])
```

## 3.

Create a variable `ingredients` with a `pd.Series` that looks like:

```
Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object
```

In [13]:
ingredients = None

q3.check()
ingredients

<IPython.core.display.Javascript object>

<span style="color:#ccaa33">Check:</span> When you've updated the starter code, `check()` will tell you whether your code is correct. You need to update the code that creates variable `ingredients`

In [14]:
#%%RM_IF(PROD)%%
# Correct solution (canonical)
quantities = ['4 cups', '1 cup', '2 large', '1 can']
items = ['Flour', 'Milk', 'Eggs', 'Spam']
ingredients = pd.Series(quantities, index=items, name='Dinner')

q3.check()
ingredients

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object

In [15]:
#%%RM_IF(PROD)%%
# Incorrect (no name set)
quantities = ['4 cups', '1 cup', '2 large', '1 can']
items = ['Flour', 'Milk', 'Eggs', 'Spam']
ingredients = pd.Series(quantities, index=items)

q3.check()
ingredients

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
dtype: object

In [16]:
#%%RM_IF(PROD)%%
# Incorrect (wrong order)
quantities = ['4 cups', '1 cup', '2 large', '1 can'][::-1]
items = ['Flour', 'Milk', 'Eggs', 'Spam'][::-1]
ingredients = pd.Series(quantities, index=items, name='Dinner')

q3.check()
ingredients

<IPython.core.display.Javascript object>

<span style="color:#cc3333">Incorrect:</span> Incorrect value for `ingredients`

Spam       1 can
Eggs     2 large
Milk       1 cup
Flour     4 cups
Name: Dinner, dtype: object

In [17]:
#_COMMENT_IF(PROD)_
q3.solution()

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
quantities = ['4 cups', '1 cup', '2 large', '1 can']
items = ['Flour', 'Milk', 'Eggs', 'Spam']
recipe = pd.Series(quantities, index=items, name='Dinner')
```

## 4.

Read the following csv dataset of wine reviews into a DataFrame called `reviews`:

![](https://i.imgur.com/74RCZtU.png)

The filepath to the csv file is `../input/wine-reviews/winemag-data_first150k.csv`.

In [18]:
reviews = None

q4.check()
reviews

<IPython.core.display.Javascript object>

<span style="color:#ccaa33">Check:</span> When you've updated the starter code, `check()` will tell you whether your code is correct. You need to update the code that creates variable `reviews`

In [19]:
#%%RM_IF(PROD)%%
# Correct solution (canonical)
reviews = pd.read_csv('../input/wine-reviews/winemag-data_first150k.csv')

q4.check()
reviews

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
...,...,...,...,...,...,...,...,...,...,...,...
150928,150928,France,"A perfect salmon shade, with scents of peaches...",Grand Brut Rosé,90,52.0,Champagne,Champagne,,Champagne Blend,Gosset
150929,150929,Italy,More Pinot Grigios should taste like this. A r...,,90,15.0,Northeastern Italy,Alto Adige,,Pinot Grigio,Alois Lageder


In [20]:
#_COMMENT_IF(PROD)_
q4.solution()

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
reviews = pd.read_csv('../input/wine-reviews/winemag-data_first150k.csv')
```

In [21]:
#%%RM_IF(PROD)%%
import os
def cleanup_ungulates():
    """Function for cleaning up file system state between tests."""
    try:
        os.remove('cows_and_goats.csv')
    except FileNotFoundError:
        pass

cleanup_ungulates()

## 5.

Run the cell below to create and display a DataFrame called `animals`:

In [22]:
animals = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, index=['Year 1', 'Year 2'])
animals

Unnamed: 0,Cows,Goats
Year 1,12,22
Year 2,20,19


In the cell below, write code to save this DataFrame to disk as a csv file with the name `cows_and_goats.csv`.

In [23]:
# Your code goes here

q5.check()

<IPython.core.display.Javascript object>

<span style="color:#cc3333">Incorrect:</span> Expected file to exist with name `cows_and_goats.csv`

In [24]:
#%%RM_IF(PROD)%%
# Correct solution (canonical)
animals.to_csv("cows_and_goats.csv")

q5.check()
cleanup_ungulates()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [25]:
#_COMMENT_IF(PROD)_
q5.solution()

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
animals.to_csv("cows_and_goats.csv")
```

## 6.

This exercise is optional. Read the following SQL data into a DataFrame called `music_reviews`:

![](https://i.imgur.com/mmvbOT3.png)

The filepath is `../input/pitchfork-data/database.sqlite`. Hint: use the `sqlite3` library. The name of the table is `artists`.

In [26]:
music_reviews = None

q6.check()
music_reviews

<IPython.core.display.Javascript object>

<span style="color:#ccaa33">Check:</span> When you've updated the starter code, `check()` will tell you whether your code is correct. You need to update the code that creates variable `music_reviews`

In [27]:
#%%RM_IF(PROD)%%
# Correct solution (canonical)
import sqlite3
conn = sqlite3.connect("../input/pitchfork-data/database.sqlite")
music_reviews = pd.read_sql_query("SELECT * FROM artists", conn)

q6.check()
music_reviews

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

Unnamed: 0,reviewid,artist
0,22703,massive attack
1,22721,krallice
...,...,...
18829,2413,don caballero
18830,3723,neil hamburger


In [28]:
#_COMMENT_IF(PROD)_
q6.solution()

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
import sqlite3
conn = sqlite3.connect("../input/pitchfork-data/database.sqlite")
music_reviews = pd.read_sql_query("SELECT * FROM artists", conn)
```

## Keep going

Move on to the **[indexing, selecting and assigning workbook](https://www.kaggle.com/kernels/fork/587910)**

___
This is part of the [Learn Pandas](https://www.kaggle.com/learn/pandas) series.