### <font color='#B31B1B'>  Pandas Practice Solutions </font>

**Warning:** Give the questions a go before you look at the solutions (you'll learn more!), also there are more than one way to solve these problems. 

Use the airbnb data (`airbnb.csv`) to answer the following:

In [1]:
import pandas as pd

airbnb = pd.read_csv('data/airbnb.csv')

#### <font color='#B31B1B'>  Exercise 1 </font>

Alice is going to Lisbon for a week with her husband and 2 kids. They are looking for a full apartment with separate rooms for parents and children. Money is not an issue for them, but they are looking for a good place. This means they are only looking for places with more than 10 reviews and a score above 4. When we show Alice our listing selection we need to make sure we are sorting the listings from the best score to the worse one. In case some listings have the same score, we will have to sort them by the number of reviews (the more the better). We need to give her  3 alternatives.

In [4]:
#First we apply all our filters on rooms
#Then we sort values first by score, then reviews (ascending = False so better scores come first)
#We then get the top 3 results
(airbnb
 .query('(room_type == \'Entire home/apt\') and (bedrooms >= 2) and (reviews > 10) and (overall_satisfaction > 4)')
 .sort_values(by=['overall_satisfaction','reviews'], ascending=[False,False])
 .head(3)
)

Unnamed: 0,room_id,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
120,176153,842219,Entire home/apt,Misericórdia,438,5.0,4,2.0,102.0
16,44043,192830,Entire home/apt,Santa Maria Maior,316,5.0,7,3.0,80.0
140,202150,989393,Entire home/apt,Santa Maria Maior,274,5.0,4,2.0,62.0


#### <font color='#B31B1B'>  Exercise 2 </font>

Diana is going to spend 3 nights in Lisbon and she wants to meet new people. She has a budget of 50€. We need to provide to her the 10 cheapests listings, with a preference for shared rooms. We need to sort the rooms by score (descending).

In [6]:
#Here we first look only at shared rooms
#Then get the 10 cheapest ones
#Then sort them by score
(airbnb
 .query('room_type == \'Shared room\'')
 .sort_values(by=['price'])
 .head(10)
 .sort_values('overall_satisfaction', ascending=False)
)

Unnamed: 0,room_id,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
11789,18154378,124601372,Shared room,Santo António,4,4.5,4,1.0,11.0
5616,9317561,48360716,Shared room,Arroios,13,4.5,4,1.0,11.0
3562,5557699,28812904,Shared room,Santa Maria Maior,22,4.0,1,1.0,10.0
1010,1179457,5799522,Shared room,Santo António,42,4.0,16,1.0,10.0
3597,5610245,29084261,Shared room,Santa Maria Maior,8,4.0,4,1.0,11.0
6639,11693115,28812904,Shared room,Santa Maria Maior,5,3.5,1,1.0,11.0
4354,6728398,28812904,Shared room,Santa Maria Maior,3,2.5,1,1.0,11.0
13148,19314160,135270245,Shared room,Santa Clara,0,0.0,1,1.0,10.0
7584,13116032,72951043,Shared room,Arroios,1,0.0,8,1.0,10.0
11787,18154042,124601372,Shared room,Avenidas Novas,1,0.0,6,1.0,11.0


For the following questions use the US primary results data (`primary_results.csv`).

In [7]:
votes = pd.read_csv('data/primary_results.csv')

In [8]:
votes.head()

Unnamed: 0,state,state_abbreviation,county,fips,party,candidate,votes,fraction_votes
0,Alabama,AL,Autauga,1001.0,Democrat,Bernie Sanders,544,0.182
1,Alabama,AL,Autauga,1001.0,Democrat,Hillary Clinton,2387,0.8
2,Alabama,AL,Baldwin,1003.0,Democrat,Bernie Sanders,2694,0.329
3,Alabama,AL,Baldwin,1003.0,Democrat,Hillary Clinton,5290,0.647
4,Alabama,AL,Barbour,1005.0,Democrat,Bernie Sanders,222,0.078


#### <font color='#B31B1B'>  Exercise 3 </font>

Overall, which percentage of votes did every party get?

In [13]:
# Here we first group votes by party affiliation and then sum
# We then normalize by the total number of votes
votes.groupby('party')['votes'].sum() / votes['votes'].sum() * 100

# Remember these are primary votes (and the rep. primary was more competitive) not the general election!

party
Democrat      48.733082
Republican    51.266918
Name: votes, dtype: float64

#### <font color='#B31B1B'>  Exercise 4 </font>

Who is the democrat candidate that got the most votes in the state of New York?

In [21]:
#We first pull out democratic votes in NY
#We then sum them for each candidate
# Then sort by the vote counts and return the top vote getting candidate
(votes
 .query('(party == \'Democrat\') and (state_abbreviation == \'NY\')')
 .groupby('candidate')['votes'].sum()
 .reset_index()
 .sort_values('votes',ascending=False)
 .head(1)
 ['candidate']
)

1    Hillary Clinton
Name: candidate, dtype: object

#### <font color='#B31B1B'>  Exercise 5 </font>


Let's consider democrat states those where the democrats got more votes and republican states those where the republican candidates got more votes. Which states are democrat and which republican?


*hint: one way to find out is by doing a pivot table using the sum as an aggregating function*

In [28]:
#Here we use a pivot table to get total votes for each party in each state
votes_by_party = pd.pivot_table(votes, values=["votes"], index="state",
                                   columns="party", aggfunc="sum")["votes"].reset_index()

#We then get dem + rep states by filtering states using condition on who got more votes 
democrat_states = votes_by_party[votes_by_party.Democrat> votes_by_party.Republican].state.to_list()
republican_states = votes_by_party[votes_by_party.Democrat < votes_by_party.Republican].state.to_list()

In [29]:
democrat_states

['California',
 'Connecticut',
 'Delaware',
 'Hawaii',
 'Illinois',
 'Kentucky',
 'Louisiana',
 'Maryland',
 'Massachusetts',
 'New Jersey',
 'New Mexico',
 'New York',
 'Oregon',
 'Pennsylvania',
 'Rhode Island',
 'Vermont',
 'West Virginia']