# Useful Methods

Let's cover some useful methods and functions built in to pandas. This is actually just a small sampling of the functions and methods available in Pandas, but they are some of the most commonly used.
The [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/index.html) is a great resource to continue exploring more methods and functions (we will introduce more further along in the course).
Here is a list of functions and methods we'll cover here (click on one to jump to that section in this notebook.):

* [apply() method](#apply_method)
* [apply() with a function](#apply_function)
* [apply() with a lambda expression](#apply_lambda)
* [apply() on multiple columns](#apply_multiple)
* [describe()](#describe)
* [sort_values()](#sort)
* [corr()](#corr)
* [idxmin and idxmax](#idx)
* [value_counts](#v_c)
* [replace](#replace)
* [unique and nunique](#uni)
* [map](#map)
* [duplicated and drop_duplicates](#dup)
* [between](#bet)
* [sample](#sample)
* [nlargest](#n)


<a id='apply_method'></a>

## The .apply() method

Here we will learn about a very useful method known as **apply** on a DataFrame. This allows us to apply and broadcast custom functions on a DataFrame column

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('data/tips.csv')

In [3]:
df.head()

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


In [4]:
df.shape

(244, 11)

<a id='apply_function'></a>
### apply with a function

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   total_bill        244 non-null    float64
 1   tip               244 non-null    float64
 2   gender            244 non-null    object 
 3   smoker            244 non-null    object 
 4   day               244 non-null    object 
 5   time              244 non-null    object 
 6   size              244 non-null    int64  
 7   price_per_person  244 non-null    float64
 8   Payer Name        244 non-null    object 
 9   CC Number         244 non-null    int64  
 10  Payment ID        244 non-null    object 
dtypes: float64(3), int64(2), object(6)
memory usage: 21.1+ KB


In [6]:
len(df)

244

In [13]:
def last_four(num):
    return str(num)[-4:]

In [7]:
type(3560325168603410)

int

In [11]:
str(3560325168603410)[12:]

'3410'

In [12]:
str(3560325168603410)[-4:]

'3410'

In [14]:
last_four(3560325168603410)

'3410'

In [15]:
df.head()

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


In [16]:
df['CC Number']

0      3560325168603410
1      4478071379779230
2      6011812112971322
3      4676137647685994
4      4832732618637221
             ...       
239    5296068606052842
240    3506806155565404
241    6011891618747196
242       4375220550950
243    3511451626698139
Name: CC Number, Length: 244, dtype: int64

In [17]:
df['last_four'] = df['CC Number'].apply(last_four)

In [18]:
df.head()

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,9230
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,1322
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,5994
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,7221


In [19]:
df['Payment ID'] = df['Payment ID'].apply(last_four)

In [20]:
df.head()

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,3410
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608,9230
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,4458,1322
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,5260,5994
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,2251,7221


### Using .apply() with more complex functions

In [21]:
def yelp(price):
    if price < 10:
        return '$'
    elif price >= 10 and price < 30:
        return '$$'
    else:
        return '$$$'

In [22]:
df['yelp'] = df['total_bill'].apply(yelp)

In [25]:
df.head(15)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,yelp
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,3410,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608,9230,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,4458,1322,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,5260,5994,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,2251,7221,$$
5,25.29,4.71,Male,No,Sun,Dinner,4,6.32,Erik Smith,213140353657882,9679,7882,$$
6,8.77,2.0,Male,No,Sun,Dinner,2,4.38,Kristopher Johnson,2223727524230344,5985,344,$
7,26.88,3.12,Male,No,Sun,Dinner,4,6.72,Robert Buck,3514785077705092,8157,5092,$$
8,15.04,1.96,Male,No,Sun,Dinner,2,7.52,Joseph Mcdonald,3522866365840377,6820,377,$$
9,14.78,3.23,Male,No,Sun,Dinner,2,7.39,Jerome Abbott,3532124519049786,3775,9786,$$


<a id='apply_lambda'></a>
### apply with lambda

In [26]:
def simple(num):
    return num*2

In [27]:
lambda num: num*2

<function __main__.<lambda>(num)>

In [29]:
16.99*0.18

3.0582

In [31]:
df['total_bill'].apply(lambda bill:bill*0.18)

0      3.0582
1      1.8612
2      3.7818
3      4.2624
4      4.4262
        ...  
239    5.2254
240    4.8924
241    4.0806
242    3.2076
243    3.3804
Name: total_bill, Length: 244, dtype: float64

In [34]:
df.head(2)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,yelp
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,3410,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608,9230,$$


In [35]:
df = df.drop(['last_four','yelp'],axis=1)

In [36]:
df.head(2)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608


<a id='apply_multiple'></a>
## apply that uses multiple columns

Note, there are several ways to do this:

https://stackoverflow.com/questions/19914937/applying-function-with-multiple-arguments-to-create-a-new-pandas-column

In [38]:
df.head(3)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,4458


In [41]:
def quality(total_bill,tip):
    if tip/total_bill  > 0.25:
        return "Generous"
    else:
        return "Other"
#

# def quality(total_bill,tip):
#     return "Generous" if tip/total_bill>0.25 else "Other" # Ternary Operator

In [43]:
df[['total_bill','tip']].head()

Unnamed: 0,total_bill,tip
0,16.99,1.01
1,10.34,1.66
2,21.01,3.5
3,23.68,3.31
4,24.59,3.61


In [53]:
# To understand what is happening internally
# df[['total_bill','tip']]
# df[['total_bill','tip']].head(10)
# df[['total_bill','tip']].head(4).apply(lambda df:print(df),axis=1)
# df[['total_bill','tip']].head(4).apply(lambda df:print(df["total_bill"],df["tip"]),axis=1)


# df[['total_bill','tip']]
# df[['total_bill','tip']].head()
# df[['total_bill','tip']].head()['total_bill']
# df[['total_bill','tip']].head()['tip']

In [51]:
df['Tip Quality'] = df[['total_bill','tip']].apply(lambda df: quality(df['total_bill'],df['tip']),axis=1)

In [60]:
df.tail(15)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality
229,22.12,2.88,Female,Yes,Sat,Dinner,2,11.06,Jennifer Russell,4793003293608,3943,Other
230,24.01,2.0,Male,Yes,Sat,Dinner,4,6.0,Michael Osborne,4258682154026,7872,Other
231,15.69,3.0,Male,Yes,Sat,Dinner,3,5.23,Jason Parks,4812333796161,6334,Other
232,11.61,3.39,Male,No,Sat,Dinner,2,5.8,James Taylor,6011482917327995,2124,Generous
233,10.77,1.47,Male,No,Sat,Dinner,2,5.38,Paul Novak,6011698897610858,1467,Other
234,15.53,3.0,Male,Yes,Sat,Dinner,2,7.76,Tracy Douglas,4097938155941930,7220,Other
235,10.07,1.25,Male,No,Sat,Dinner,2,5.04,Sean Gonzalez,3534021246117605,4615,Other
236,12.6,1.0,Male,Yes,Sat,Dinner,2,6.3,Matthew Myers,3543676378973965,5032,Other
237,32.83,1.17,Male,Yes,Sat,Dinner,2,16.42,Thomas Brown,4284722681265508,2929,Other
238,35.83,4.67,Female,No,Sat,Dinner,3,11.94,Kimberly Crane,676184013727,9777,Other


In [62]:
df['Tip Quality 2'] = df[['total_bill','tip']].apply(lambda df: "Generous" if df['tip']/df['total_bill']>0.25 else "Other",axis=1)

In [63]:
df.tail(15)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality,Tip Quality 2
229,22.12,2.88,Female,Yes,Sat,Dinner,2,11.06,Jennifer Russell,4793003293608,3943,Other,Other
230,24.01,2.0,Male,Yes,Sat,Dinner,4,6.0,Michael Osborne,4258682154026,7872,Other,Other
231,15.69,3.0,Male,Yes,Sat,Dinner,3,5.23,Jason Parks,4812333796161,6334,Other,Other
232,11.61,3.39,Male,No,Sat,Dinner,2,5.8,James Taylor,6011482917327995,2124,Generous,Generous
233,10.77,1.47,Male,No,Sat,Dinner,2,5.38,Paul Novak,6011698897610858,1467,Other,Other
234,15.53,3.0,Male,Yes,Sat,Dinner,2,7.76,Tracy Douglas,4097938155941930,7220,Other,Other
235,10.07,1.25,Male,No,Sat,Dinner,2,5.04,Sean Gonzalez,3534021246117605,4615,Other,Other
236,12.6,1.0,Male,Yes,Sat,Dinner,2,6.3,Matthew Myers,3543676378973965,5032,Other,Other
237,32.83,1.17,Male,Yes,Sat,Dinner,2,16.42,Thomas Brown,4284722681265508,2929,Other,Other
238,35.83,4.67,Female,No,Sat,Dinner,3,11.94,Kimberly Crane,676184013727,9777,Other,Other


In [64]:
import numpy as np

In [67]:
df['Tip Quality 3'] = np.vectorize(quality)(df['total_bill'], df['tip'])

In [68]:
df.tail(15)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality,Tip Quality 2,Tip Quality 3
229,22.12,2.88,Female,Yes,Sat,Dinner,2,11.06,Jennifer Russell,4793003293608,3943,Other,Other,Other
230,24.01,2.0,Male,Yes,Sat,Dinner,4,6.0,Michael Osborne,4258682154026,7872,Other,Other,Other
231,15.69,3.0,Male,Yes,Sat,Dinner,3,5.23,Jason Parks,4812333796161,6334,Other,Other,Other
232,11.61,3.39,Male,No,Sat,Dinner,2,5.8,James Taylor,6011482917327995,2124,Generous,Generous,Generous
233,10.77,1.47,Male,No,Sat,Dinner,2,5.38,Paul Novak,6011698897610858,1467,Other,Other,Other
234,15.53,3.0,Male,Yes,Sat,Dinner,2,7.76,Tracy Douglas,4097938155941930,7220,Other,Other,Other
235,10.07,1.25,Male,No,Sat,Dinner,2,5.04,Sean Gonzalez,3534021246117605,4615,Other,Other,Other
236,12.6,1.0,Male,Yes,Sat,Dinner,2,6.3,Matthew Myers,3543676378973965,5032,Other,Other,Other
237,32.83,1.17,Male,Yes,Sat,Dinner,2,16.42,Thomas Brown,4284722681265508,2929,Other,Other,Other
238,35.83,4.67,Female,No,Sat,Dinner,3,11.94,Kimberly Crane,676184013727,9777,Other,Other,Other


Wow! Vectorization is much faster! Keep **np.vectorize()** in mind for the future.

Full Details:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

<a id='describe'></a>
### df.describe for statistical summaries

In [69]:
df.describe()

Unnamed: 0,total_bill,tip,size,price_per_person,CC Number
count,244.0,244.0,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672,7.888197,2563496000000000.0
std,8.902412,1.383638,0.9511,2.914234,2369340000000000.0
min,3.07,1.0,1.0,2.88,60406790000.0
25%,13.3475,2.0,2.0,5.8,30407310000000.0
50%,17.795,2.9,2.0,7.255,3525318000000000.0
75%,24.1275,3.5625,3.0,9.39,4553675000000000.0
max,50.81,10.0,6.0,20.27,6596454000000000.0


In [70]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
total_bill,244.0,19.78594,8.902412,3.07,13.3475,17.795,24.1275,50.81
tip,244.0,2.998279,1.383638,1.0,2.0,2.9,3.5625,10.0
size,244.0,2.569672,0.9510998,1.0,2.0,2.0,3.0,6.0
price_per_person,244.0,7.888197,2.914234,2.88,5.8,7.255,9.39,20.27
CC Number,244.0,2563496000000000.0,2369340000000000.0,60406790000.0,30407310000000.0,3525318000000000.0,4553675000000000.0,6596454000000000.0


In [71]:
df.head()

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality,Tip Quality 2,Tip Quality 3
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,Other,Other,Other
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608,Other,Other,Other
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,4458,Other,Other,Other
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,5260,Other,Other,Other
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,2251,Other,Other,Other


In [73]:
df = df.drop(['Tip Quality 2', 'Tip Quality 3'],axis=1)

In [75]:
df.head()

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,Other
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608,Other
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,4458,Other
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,5260,Other
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,2251,Other


<a id='sort'></a>
### sort_values()

In [76]:
df.sort_values('tip')

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality
92,5.75,1.00,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,3780,Other
111,7.25,1.00,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,4801,Other
67,3.07,1.00,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,3455,Generous
236,12.60,1.00,Male,Yes,Sat,Dinner,2,6.30,Matthew Myers,3543676378973965,5032,Other
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,Other
...,...,...,...,...,...,...,...,...,...,...,...,...
141,34.30,6.70,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,1025,Other
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,8139,Other
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,t239,Other
212,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,4590,Other


In [None]:
df.sort_values('tip',ascending=False)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality
170,50.81,10.00,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,1954,Other
212,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,4590,Other
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,t239,Other
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,8139,Other
141,34.30,6.70,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,1025,Other
...,...,...,...,...,...,...,...,...,...,...,...,...
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,Other
111,7.25,1.00,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,4801,Other
92,5.75,1.00,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,3780,Other
67,3.07,1.00,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,3455,Generous


In [79]:
# Helpful if you want to reorder after a sort
# https://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns
df.sort_values(['tip','size'])

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality
67,3.07,1.00,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,3455,Generous
111,7.25,1.00,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,4801,Other
92,5.75,1.00,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,3780,Other
236,12.60,1.00,Male,Yes,Sat,Dinner,2,6.30,Matthew Myers,3543676378973965,5032,Other
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,Other
...,...,...,...,...,...,...,...,...,...,...,...,...
141,34.30,6.70,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,1025,Other
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,8139,Other
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,t239,Other
212,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,4590,Other


In [80]:
df.sort_values(['tip','size'],ascending=False) # descending order

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality
170,50.81,10.00,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,1954,Other
212,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,4590,Other
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,t239,Other
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,8139,Other
141,34.30,6.70,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,1025,Other
...,...,...,...,...,...,...,...,...,...,...,...,...
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,Other
92,5.75,1.00,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,3780,Other
236,12.60,1.00,Male,Yes,Sat,Dinner,2,6.30,Matthew Myers,3543676378973965,5032,Other
67,3.07,1.00,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,3455,Generous


<a id='corr'></a>
## df.corr() for correlation checks

[Wikipedia on Correlation](https://en.wikipedia.org/wiki/Correlation_and_dependence)

In [81]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   total_bill        244 non-null    float64
 1   tip               244 non-null    float64
 2   gender            244 non-null    object 
 3   smoker            244 non-null    object 
 4   day               244 non-null    object 
 5   time              244 non-null    object 
 6   size              244 non-null    int64  
 7   price_per_person  244 non-null    float64
 8   Payer Name        244 non-null    object 
 9   CC Number         244 non-null    int64  
 10  Payment ID        244 non-null    object 
 11  Tip Quality       244 non-null    object 
dtypes: float64(3), int64(2), object(7)
memory usage: 23.0+ KB


In [82]:
df.corr(numeric_only=True)

Unnamed: 0,total_bill,tip,size,price_per_person,CC Number
total_bill,1.0,0.675734,0.598315,0.647554,0.104576
tip,0.675734,1.0,0.489299,0.347405,0.110857
size,0.598315,0.489299,1.0,-0.175359,-0.030239
price_per_person,0.647554,0.347405,-0.175359,1.0,0.13524
CC Number,0.104576,0.110857,-0.030239,0.13524,1.0


In [84]:
df[['total_bill','tip','size']].corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0


In [85]:
df.head()

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,Other
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608,Other
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,4458,Other
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,5260,Other
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,2251,Other


In [86]:
df = df.drop(['Tip Quality'],axis=1)

In [87]:
df.head(2)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608


<a id='idx'></a>
### idxmin and idxmax

In [88]:
df.head()

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,2251


In [91]:
df['total_bill'].max()

np.float64(50.81)

In [92]:
max_total_bill = df['total_bill'].max()
print(max_total_bill)

50.81


In [93]:
df['total_bill'].idxmax()

170

In [95]:
df.iloc[169:172]

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
169,10.63,2.0,Female,Yes,Sat,Dinner,2,5.32,Amy Hill,3536332481454019,1788
170,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,1954
171,15.81,3.16,Male,Yes,Sat,Dinner,2,7.9,David Hall,502004138207,6750


In [96]:
df.iloc[df['total_bill'].idxmax()]

total_bill                     50.81
tip                             10.0
gender                          Male
smoker                           Yes
day                              Sat
time                          Dinner
size                               3
price_per_person               16.94
Payer Name             Gregory Clark
CC Number           5473850968388236
Payment ID                      1954
Name: 170, dtype: object

In [97]:
df['total_bill'].min()

np.float64(3.07)

In [None]:
df['total_bill'].idxmin() # Index location

67

In [99]:
df.iloc[65:70]

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
65,20.08,3.15,Male,No,Sat,Dinner,3,6.69,Justin Dixon,180021262464926,6840
66,16.45,2.47,Female,No,Sat,Dinner,2,8.22,Rachel Vaughn,3569262692675583,4750
67,3.07,1.0,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,3455
68,20.23,2.01,Male,No,Sat,Dinner,2,10.12,Mr. Travis Bailey Jr.,60406789937,t561
69,15.01,2.09,Male,Yes,Sat,Dinner,2,7.5,Adam Hall,4700924377057571,t855


In [100]:
df.iloc[67]

total_bill                      3.07
tip                              1.0
gender                        Female
smoker                           Yes
day                              Sat
time                          Dinner
size                               1
price_per_person                3.07
Payer Name             Tiffany Brock
CC Number           4359488526995267
Payment ID                      3455
Name: 67, dtype: object

In [102]:
df.iloc[df['total_bill'].idxmin()]

total_bill                      3.07
tip                              1.0
gender                        Female
smoker                           Yes
day                              Sat
time                          Dinner
size                               1
price_per_person                3.07
Payer Name             Tiffany Brock
CC Number           4359488526995267
Payment ID                      3455
Name: 67, dtype: object

<a id='v_c'></a>
### value_counts

Nice method to quickly get a count per category. Only makes sense on categorical columns.

In [103]:
df.head()

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,2251


In [105]:
df['gender'].unique()

array(['Female', 'Male'], dtype=object)

In [104]:
df['gender'].value_counts()

gender
Male      157
Female     87
Name: count, dtype: int64

In [106]:
df['day'].unique()

array(['Sun', 'Sat', 'Thur', 'Fri'], dtype=object)

In [107]:
df['day'].value_counts()

day
Sat     87
Sun     76
Thur    62
Fri     19
Name: count, dtype: int64

<a id='replace'></a>

### replace

Quickly replace values with another one.

In [108]:
df.head()

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,2251


In [119]:
df['Tip Quality'] = np.vectorize(quality)(df['total_bill'], df['tip'])

In [121]:
df.tail(15)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality
229,22.12,2.88,Female,Yes,Sat,Dinner,2,11.06,Jennifer Russell,4793003293608,3943,Other
230,24.01,2.0,Male,Yes,Sat,Dinner,4,6.0,Michael Osborne,4258682154026,7872,Other
231,15.69,3.0,Male,Yes,Sat,Dinner,3,5.23,Jason Parks,4812333796161,6334,Other
232,11.61,3.39,Male,No,Sat,Dinner,2,5.8,James Taylor,6011482917327995,2124,Generous
233,10.77,1.47,Male,No,Sat,Dinner,2,5.38,Paul Novak,6011698897610858,1467,Other
234,15.53,3.0,Male,Yes,Sat,Dinner,2,7.76,Tracy Douglas,4097938155941930,7220,Other
235,10.07,1.25,Male,No,Sat,Dinner,2,5.04,Sean Gonzalez,3534021246117605,4615,Other
236,12.6,1.0,Male,Yes,Sat,Dinner,2,6.3,Matthew Myers,3543676378973965,5032,Other
237,32.83,1.17,Male,Yes,Sat,Dinner,2,16.42,Thomas Brown,4284722681265508,2929,Other
238,35.83,4.67,Female,No,Sat,Dinner,3,11.94,Kimberly Crane,676184013727,9777,Other


In [112]:
df['Tip Quality'].value_counts()

Tip Quality
Other       234
Generous     10
Name: count, dtype: int64

In [122]:
df['Tip Quality'].replace(to_replace='Other',value='Ok')

0      Ok
1      Ok
2      Ok
3      Ok
4      Ok
       ..
239    Ok
240    Ok
241    Ok
242    Ok
243    Ok
Name: Tip Quality, Length: 244, dtype: object

In [123]:
df['Tip Quality'] = df['Tip Quality'].replace(to_replace=['Other','other','OTHER'],value='Ok')

In [124]:
df.tail(15)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality
229,22.12,2.88,Female,Yes,Sat,Dinner,2,11.06,Jennifer Russell,4793003293608,3943,Ok
230,24.01,2.0,Male,Yes,Sat,Dinner,4,6.0,Michael Osborne,4258682154026,7872,Ok
231,15.69,3.0,Male,Yes,Sat,Dinner,3,5.23,Jason Parks,4812333796161,6334,Ok
232,11.61,3.39,Male,No,Sat,Dinner,2,5.8,James Taylor,6011482917327995,2124,Generous
233,10.77,1.47,Male,No,Sat,Dinner,2,5.38,Paul Novak,6011698897610858,1467,Ok
234,15.53,3.0,Male,Yes,Sat,Dinner,2,7.76,Tracy Douglas,4097938155941930,7220,Ok
235,10.07,1.25,Male,No,Sat,Dinner,2,5.04,Sean Gonzalez,3534021246117605,4615,Ok
236,12.6,1.0,Male,Yes,Sat,Dinner,2,6.3,Matthew Myers,3543676378973965,5032,Ok
237,32.83,1.17,Male,Yes,Sat,Dinner,2,16.42,Thomas Brown,4284722681265508,2929,Ok
238,35.83,4.67,Female,No,Sat,Dinner,3,11.94,Kimberly Crane,676184013727,9777,Ok


In [125]:
df['Tip Quality'].value_counts()

Tip Quality
Ok          234
Generous     10
Name: count, dtype: int64

<a id='uni'></a>
### unique

In [126]:
df['size'].unique()

array([2, 3, 4, 1, 6, 5])

In [128]:
df['size'].nunique()

6

In [127]:
df['gender'].unique()

array(['Female', 'Male'], dtype=object)

In [129]:
df['gender'].nunique()

2

In [130]:
df['time'].unique()

array(['Dinner', 'Lunch'], dtype=object)

In [131]:
df['time'].nunique()

2

In [132]:
df['day'].unique()

array(['Sun', 'Sat', 'Thur', 'Fri'], dtype=object)

In [133]:
df['day'].nunique()

4

<a id='map'></a>
### map

In [134]:
my_map = {'Dinner':'D','Lunch':'L'}

In [135]:
df['time'].map(my_map)

0      D
1      D
2      D
3      D
4      D
      ..
239    D
240    D
241    D
242    D
243    D
Name: time, Length: 244, dtype: object

In [136]:
df['map'] = df['time'].map(my_map)

In [137]:
df.head()

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality,map
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,Ok,D
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608,Ok,D
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,4458,Ok,D
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,5260,Ok,D
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,2251,Ok,D


<a id='dup'></a>
## Duplicates

### .duplicated() and .drop_duplicates()

In [138]:
# Returns True for the 1st instance of a duplicated row
df.duplicated()

0      False
1      False
2      False
3      False
4      False
       ...  
239    False
240    False
241    False
242    False
243    False
Length: 244, dtype: bool

In [139]:
simple_df = pd.DataFrame([1,2,2],['a','b','c'])

In [140]:
simple_df

Unnamed: 0,0
a,1
b,2
c,2


In [141]:
simple_df.duplicated()

a    False
b    False
c     True
dtype: bool

In [142]:
simple_df.drop_duplicates()

Unnamed: 0,0
a,1
b,2


In [143]:
simple_df

Unnamed: 0,0
a,1
b,2
c,2


In [144]:
simple_df.drop_duplicates(inplace=True)
# or 
# simple_df = simple_df.drop_duplicates()

In [145]:
simple_df

Unnamed: 0,0
a,1
b,2


<a id='bet'></a>
## between

left: A scalar value that defines the left boundary<br>
right: A scalar value that defines the right boundary<br>
inclusive: A String value: 'both','left', 'right', or 'neither'

In [None]:
df.head()

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality,map
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,Ok,D
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,4608,Ok,D
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,4458,Ok,D
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,5260,Ok,D
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,2251,Ok,D


In [147]:
df['total_bill'].between(15,20,inclusive='both')

0       True
1      False
2      False
3      False
4      False
       ...  
239    False
240    False
241    False
242     True
243     True
Name: total_bill, Length: 244, dtype: bool

In [148]:
df[df['total_bill'].between(15,20,inclusive='neither')]

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Tip Quality,map
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,2959,Ok,D
8,15.04,1.96,Male,No,Sun,Dinner,2,7.52,Joseph Mcdonald,3522866365840377,6820,Ok,D
12,15.42,1.57,Male,No,Sun,Dinner,2,7.71,Chad Harrington,577040572932,1300,Ok,D
13,18.43,3.00,Male,No,Sun,Dinner,4,4.61,Joshua Jones,6011163105616890,2971,Ok,D
17,16.29,3.71,Male,No,Sun,Dinner,3,5.43,John Pittman,6521340257218708,2998,Ok,D
...,...,...,...,...,...,...,...,...,...,...,...,...,...
225,16.27,2.50,Female,Yes,Fri,Lunch,2,8.14,Whitney Arnold,3579111947217428,6665,Ok,L
231,15.69,3.00,Male,Yes,Sat,Dinner,3,5.23,Jason Parks,4812333796161,6334,Ok,D
234,15.53,3.00,Male,Yes,Sat,Dinner,2,7.76,Tracy Douglas,4097938155941930,7220,Ok,D
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,at17,Ok,D


In [None]:
df[df['total_bill'].between(15,20,inclusive='left')].shape  # >= 15 and <20

(67, 13)

# Discussed on 08-Mar-2025

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv('data/tips.csv')

<a id='sample'></a>
## sample

In [3]:
df.shape

(244, 11)

In [4]:
df.sample(5)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
157,25.0,3.75,Female,No,Sun,Dinner,4,6.25,Laura Robles,213158685144262,Sun7015
143,27.05,5.0,Female,No,Thur,Lunch,6,4.51,Regina Jones,4311048695487,Thur6179
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
97,12.03,1.5,Male,Yes,Fri,Dinner,2,6.02,Eric Herrera,580116092652,Fri9268
172,7.25,5.15,Male,Yes,Sun,Dinner,2,3.62,Larry White,30432617123103,Sun9209


In [5]:
df.sample(frac=0.1)

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139
133,12.26,2.0,Female,No,Thur,Lunch,2,6.13,Kaitlin Wolf,676348318145,Thur1561
91,22.49,3.5,Male,No,Fri,Dinner,2,11.24,Earl Horn,6011849326227398,Fri5700
28,21.7,4.3,Male,No,Sat,Dinner,2,10.85,David Collier,5529694315416009,Sat3697
137,14.15,2.0,Female,No,Thur,Lunch,2,7.08,Vanessa Morris,213189344156819,Thur3890
238,35.83,4.67,Female,No,Sat,Dinner,3,11.94,Kimberly Crane,676184013727,Sat9777
81,16.66,3.4,Male,No,Thur,Lunch,2,8.33,William Martin,4550549048402707,Thur8232
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
153,24.55,2.0,Male,No,Sun,Dinner,4,6.14,Todd Patterson,4416804908942159,Sun8670
55,19.49,3.51,Male,No,Sun,Dinner,2,9.74,Michael Hamilton,6502227786581768,Sun1118


In [6]:
244*0.1

24.400000000000002

In [7]:
df.sample(frac=0.1).shape

(24, 11)

In [8]:
dict2 = {'a':[1,2,3,4,5],'b':[10,20,30,40,50],'c':[100,200,300,400,500]}

In [9]:
df2 = pd.DataFrame(dict2)

In [10]:
df2

Unnamed: 0,a,b,c
0,1,10,100
1,2,20,200
2,3,30,300
3,4,40,400
4,5,50,500


In [None]:
df2.sample(20,replace=True,random_state=42) # Over-sampling

Unnamed: 0,a,b,c
3,4,40,400
4,5,50,500
2,3,30,300
4,5,50,500
4,5,50,500
1,2,20,200
2,3,30,300
2,3,30,300
2,3,30,300
4,5,50,500


<a id='n'></a>
## nlargest and nsmallest

In [19]:
df.nlargest(4,'tip')

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
170,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954
212,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139


In [20]:
df.nlargest(5,'price_per_person')

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
184,40.55,3.0,Male,Yes,Sun,Dinner,2,20.27,Stephen Cox,3547798222044029,Sun5140
179,34.63,3.55,Male,Yes,Sun,Dinner,2,17.32,Brian Bailey,346656312114848,Sun9851
170,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954
175,32.9,3.11,Male,Yes,Sun,Dinner,2,16.45,Nathan Reynolds,370307040837149,Sun5109
237,32.83,1.17,Male,Yes,Sat,Dinner,2,16.42,Thomas Brown,4284722681265508,Sat2929


In [21]:
df.nlargest(10,'tip').shape

(10, 11)

In [18]:
df.nsmallest(3,'tip')

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
67,3.07,1.0,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455
92,5.75,1.0,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780
111,7.25,1.0,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801


In [25]:
df.nsmallest(4,'tip')

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
67,3.07,1.0,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455
92,5.75,1.0,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780
111,7.25,1.0,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801
236,12.6,1.0,Male,Yes,Sat,Dinner,2,6.3,Matthew Myers,3543676378973965,Sat5032


In [27]:
df.nsmallest(2,'tip')

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
67,3.07,1.0,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455
92,5.75,1.0,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780


In [28]:
df.nsmallest(2,['tip','size'])

Unnamed: 0,total_bill,tip,gender,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
67,3.07,1.0,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455
111,7.25,1.0,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801
