# Python/Pandas Assessment

### Setup
- Clone the repository containing this file to your `~/codeup-data-science/` folder.
- You'll notice that all files are gitignored. This is done to ensure that your work _does not_ get pushed to GitHub. Sharing test questions/answers is an academic integrity issue, so we need to avoid that isssue entirely. Avoid adding this repo to GitHub.
- Upload your completed notebook to the appropriate Google Classroom assignment.

### Orientation
- There are 10 exercises on this assessment worth 10 points each.
- Credit is given for programmatic solutions only; your code shows your work. Since you see the answer in the unit test code, if your function has `return 44`, for example, that's not going to earn credit.
- Your Python/pandas code should run without errors
- After each problem prompt, there is a cell to write your code followed by another cell with a unit test

### Troubleshooting
If you need a fresh start, go to Kernel and then "Restart and Clear Output" in this Jupyter Notebook

In [1]:
# Required Imports and data acquisition
import pandas as pd
from pydataset import data

df = data("tips")
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
1,16.99,1.01,Female,No,Sun,Dinner,2
2,10.34,1.66,Male,No,Sun,Dinner,3
3,21.01,3.5,Male,No,Sun,Dinner,3
4,23.68,3.31,Male,No,Sun,Dinner,2
5,24.59,3.61,Female,No,Sun,Dinner,4


####  EXAMPLE: Write a function named `exercise0`
- This function should accept a dataframe as its input argument
- Notice that the example function is returning the appropriate, programmatic code to obtain the solution
- The `assert` line checks the exercise solution code to ensure correctness

In [None]:
# This example function is solved below:
def exercise0(df):
    return len(df)

assert exercise0(df) == 244
print("Exercise 0 example exercise is complete.")

####  Write a function named `exercise1`
- Use the cell below to write your code
- This function should accept a dataframe as its input argument
- This function should return the highest `total_bill` value from the tips dataframe

In [6]:
# Write your code for the exercise1 function here
def exercise1(dataframe):
    return df.total_bill.max()

In [7]:
assert exercise1(df) == 50.81
print("Exercise 1 is complete") 

Exercise 1 is complete


####  Write a function named `exercise2`
- Use the cell below to write your code
- This function should return the number of different days in the `day` column.
- This function should accept a dataframe as its input argument.

In [8]:
# Write your code for the exercise2 function definition here
def exercise2(dataframe):
    return df.day.nunique()

In [9]:
assert exercise2(df) == 4
print("Exercise 2 is complete")

Exercise 2 is complete


####  Write a function named `exercise3`
- Use the cell below to write your `exercise3` function definition
- This function should return the number of rows that represent "Lunch" time tables
- A "table" in this dataset is a single row, representing one bill, _not_ the number of people at that table
- This function should accept a dataframe as its input argument

In [13]:
# Write your code for the exercise3 function here
def exercise3(dataframe):
    lunch = df[df.time == 'Lunch']
    return len(lunch)

In [14]:
assert exercise3(df) == 68
print("Exercise 3 is correct")

Exercise 3 is correct


####  Exercise 4 is a one line of pandas code, not a function
- Use the cell below to write the code necessary to rename the `size` column to `table_size` on the `df` variable.
- Remember that `.size` is a reserved word in Pandas, so it helps to rename this columns that share a reserved word
- Exercise 4 code is not a function, but should be 1 line of pandas code. 
- Be certain to update the `df` variable or mutate it accordingly, so that `df` has the new column name.

In [18]:
# Write your pandas code to rename the "size" column to "table_size"
df = df.rename(columns={'size': 'table_size'})

In [19]:
assert df.columns.tolist() == ['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'table_size']
print("Exercise 4 is complete")

Exercise 4 is complete


#### Write a function named `exercise5`
- This function should return the proportion of lunch tables out of all tables
- A "table" in this dataset is a single row, representing one bill, _not_ the number of people at that table
- You can use the full decimal or choose to round to 2 decimal places. Either answer will earn credit 
- This function should accept a dataframe as its input argument

In [22]:
# Exercise 5 code here
def exercise5(dataframe):
    lunch = df[df.time == 'Lunch']
    len_lunch = len(lunch)
    return len(lunch)/len(df)

In [23]:
assert exercise5(df) in [0.2786885245901639, 0.28]
print("Exercise 5 is correct")

Exercise 5 is correct


#### Exercise 6
- Write a function named `exercise6`
- This function should return the number of rows where the `total_bill` is greater than the average of all `total_bill` values.
- This function should accept a dataframe as its input argument

In [30]:
# Exercise 6 code here
def exercise6(dataframe):
    average_of_total_bill = df['total_bill'].mean()
    more_expensive = df['total_bill'] > average_of_total_bill
    return len(df[more_expensive])

In [31]:
assert exercise6(df) == 99
print("Exercise 6 is correct")

Exercise 6 is correct


#### Exercise 7
- Write a function named `exercise7`
- This function should return the highest `total_bill` value for Thursday dinner tables (each row is a table).
- This function should accept a dataframe as its input argument

Unnamed: 0,total_bill,tip,sex,smoker,day,time,table_size
244,18.78,3.0,Female,No,Thur,Dinner,2


In [42]:
# Exercise 7 code here
def exercise7(dataframe):
    thur_dinner = df[(df.day == 'Thur') & (df.time == 'Dinner')]
    return thur_dinner['total_bill'].max()


In [43]:
assert exercise7(df) == 18.78
print("Exercise 7 is correct")

Exercise 7 is correct


#### Exercise 8
- Write a function named `exercise8`
- This function should return the highest `total_bill` for tables on Thursday or Friday
- This function should accept a dataframe as its input argument

Unnamed: 0,total_bill,tip,sex,smoker,day,time,table_size
198,43.11,5.0,Female,Yes,Thur,Lunch,4


In [54]:
# Exercise 8 code here
def exercise8(dataframe):
    thur_or_fri_dinner = df[(df.day == 'Thur') | (df.day == 'Fri')]
    return thur_or_fri_dinner['total_bill'].max()

In [55]:
assert exercise8(df) == 43.11
print("Exercise 8 is correct")

Exercise 8 is correct


#### Exercise 9
- Write a function named `exercise9`
- This function should return the average `total_bill` for tables dining on a Saturday or Sunday
- This function should accept a dataframe as its input argument

In [63]:
# Exercise 9 code here:
def exercise9(dataframe):
    sat_or_sun = df[(df.day == 'Sat') | (df.day == 'Sun')]
    return sat_or_sun.total_bill.mean()

In [64]:
assert exercise9(df) in [20.89300613496933, 20.9]
print("Exercise 9 is correct")

Exercise 9 is correct


#### Exercise 10
- Write a function named `exercise10`
- This function should take in the `prices` series as its input argument.
- This function should clean these strings and our strings with dollar signs and commas into proper floats.
- The `exercise10` function should return a series containing only floats

In [71]:
prices = pd.Series(["$1,234.56", "$2,345,678.99", "$123.45", "$3,333,333.99"])

In [82]:
# Write your function definition for exercise10 here
def exercise10(series):
    string_cheese = series.str.strip('$').str.replace(',','')
    return string_cheese.astype('float64')

In [83]:
assert exercise10(prices).values.tolist() == [1234.56, 2345678.99, 123.45, 3333333.99]
print("Exercise 10 is correct.")

Exercise 10 is correct.
