### Gregory Fernandes - TIA 41706692
### José Eduardo - TIA 41720504
### Matheus Gois - TIA 41746491


# 100 pandas puzzles

Inspired by [100 Numpy exerises](https://github.com/rougier/numpy-100), here are 100* short puzzles for testing your knowledge of [pandas'](http://pandas.pydata.org/) power.

Since pandas is a large library with many different specialist features and functions, these excercises focus mainly on the fundamentals of manipulating data (indexing, grouping, aggregating, cleaning), making use of the core DataFrame and Series objects. 

Many of the excerises here are stright-forward in that the solutions require no more than a few lines of code (in pandas or NumPy... don't go using pure Python or Cython!). Choosing the right methods and following best practices is the underlying goal.

The exercises are loosely divided in sections. Each section has a difficulty rating; these ratings are subjective, of course, but should be a seen as a rough guide as to how inventive the required solution is.

If you're just starting out with pandas and you are looking for some other resources, the official documentation  is very extensive. In particular, some good places get a broader overview of pandas are...

- [10 minutes to pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html)
- [pandas basics](http://pandas.pydata.org/pandas-docs/stable/basics.html)
- [tutorials](http://pandas.pydata.org/pandas-docs/stable/tutorials.html)
- [cookbook and idioms](http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook)

Enjoy the puzzles!

\* *the list of exercises is not yet complete! Pull requests or suggestions for additional exercises, corrections and improvements are welcomed.*

## DataFrame basics

### A few of the fundamental routines for selecting, sorting, adding and aggregating data in DataFrames

Difficulty: *easy*

Note: remember to import numpy using:
```python
import numpy as np
```

Consider the following Python dictionary `data` and Python list `labels`:

``` python
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
```
(This is just some meaningless data I made up with the theme of animals and trips to a vet.)

**4.** Create a DataFrame `df` from this dictionary `data` which has the index `labels`.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [119]:
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(data,index=labels)
df

Unnamed: 0,age,animal,priority,visits
a,2.5,cat,yes,1
b,3.0,cat,yes,3
c,0.5,snake,no,2
d,,dog,yes,3
e,5.0,dog,no,2
f,2.0,cat,no,3
g,4.5,snake,no,1
h,,cat,yes,1
i,7.0,dog,no,2
j,3.0,dog,no,1


**6.** Return the first 3 rows of the DataFrame `df`.

# head e tail

In [11]:
df.head(3)

Unnamed: 0,age,animal,priority,visits
a,2.5,cat,yes,1
b,3.0,cat,yes,3
c,0.5,snake,no,2


**7.** Select just the 'animal' and 'age' columns from the DataFrame `df`.

# Loc, pega somente as duas primeiras colunas

In [57]:
df.loc['a':'e',['age','animal']]

Unnamed: 0,age,animal
a,2.5,cat
c,0.5,snake


**10.** Select the rows where the age is missing, i.e. is `NaN`.

# Seleciona especificamente a linha e a coluna

In [53]:
df.iloc[[3,7],[0,1,2,3]]

Unnamed: 0,age,animal,priority,visits
d,,dog,yes,3
h,,cat,yes,1


**11.** Select the rows where the animal is a cat *and* the age is less than 3.

# Filtra os nomes de acordo com a linha e coluna

In [111]:
df2=df[df.age < 3]
df2=df2[df2['animal'].isin(['cat'])]
df2

Unnamed: 0,age,animal,priority,visits
a,2.5,cat,yes,1
f,2.0,cat,no,3


**13.** Change the age in row 'f' to 1.5.

In [85]:
df3=df.copy()
df3.iat[5,0]=1.5
df3

Unnamed: 0,age,animal,priority,visits
a,2.5,cat,yes,1
b,3.0,cat,yes,3
c,0.5,snake,no,2
d,,dog,yes,3
e,5.0,dog,no,2
f,1.5,cat,no,3
g,4.5,snake,no,1
h,,cat,yes,1
i,7.0,dog,no,2
j,3.0,dog,no,1


**14.** Calculate the sum of all visits (the total number of visits).

In [106]:
df4 = df.loc[:,['visits']]
df4.sum()

visits    19
dtype: int64

**15.** Calculate the mean age for each different animal in `df`.

In [130]:
cat=df[df['animal'].isin(['cat'])]
dog=df[df['animal'].isin(['dog'])]
snake=df[df['animal'].isin(['snake'])]

cat = cat.loc[:,['age']]
print('cat :',cat.mean())
dog = dog.loc[:,['age']]
print('dog :',dog.mean())
snake = snake.loc[:,['age']]
print('snack :',snake.mean())

cat : age    2.5
dtype: float64
dog : age    5.0
dtype: float64
snack : age    2.5
dtype: float64


**17.** Count the number of each type of animal in `df`.

In [182]:
cat=df[df['animal'].isin(['cat'])]
dog=df[df['animal'].isin(['dog'])]
snake=df[df['animal'].isin(['snake'])]

cat

Unnamed: 0,age,animal,priority,visits
a,2.5,cat,yes,1
b,3.0,cat,yes,3
f,2.0,cat,no,3
h,,cat,yes,1


**18.** Sort `df` first by the values in the 'age' in *decending* order, then by the value in the 'visit' column in *ascending* order.

In [180]:
c = df.sort_values(by="age",ascending=False)
c

Unnamed: 0,age,animal,priority,visits
i,7.0,dog,no,2
e,5.0,dog,no,2
g,4.5,snake,no,1
b,3.0,cat,yes,3
j,3.0,dog,no,1
a,2.5,cat,yes,1
f,2.0,cat,no,3
c,0.5,snake,no,2
d,,dog,yes,3
h,,cat,yes,1


In [181]:
d = df.sort_values(by="visits")
d

Unnamed: 0,age,animal,priority,visits
a,2.5,cat,yes,1
g,4.5,snake,no,1
h,,cat,yes,1
j,3.0,dog,no,1
c,0.5,snake,no,2
e,5.0,dog,no,2
i,7.0,dog,no,2
b,3.0,cat,yes,3
d,,dog,yes,3
f,2.0,cat,no,3
