# All Pandas All The Time

Pandas is a library we're going to be using pretty much every day in this course, so we're going to do a ton of practice so you can be on your way to becoming a _PANDAS MASTER_.

![Kung fu panda excited](https://data.whicdn.com/images/201331793/original.gif)

Let's continue with the data from the Austin Animal Shelter. 

Data source: [intakes data](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm) and [outcomes data](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238).

Once again starting off with intake data, which is data describing the animals as they enter the shelter.

In [1]:
# Imports! Can't use pandas unless we bring it into our notebook
import pandas as pd

In [2]:
!ls data/

Austin_Animal_Center_Intakes_030921.csv
Austin_Animal_Center_Outcomes_030921.csv
flights.db
titanic.csv


In [3]:
# Grab the data, naming the dataframe 'intakes' this time
# Don't forget to read in DateTime as a datetime column
intakes = pd.read_csv('data/Austin_Animal_Center_Intakes_030921.csv', parse_dates=["DateTime"])

In [4]:
# Check out the first few rows
intakes.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color
0,A786884,*Brock,2019-01-03 16:19:00,01/03/2019 04:19:00 PM,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor
1,A706918,Belle,2015-07-05 12:59:00,07/05/2015 12:59:00 PM,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver
2,A724273,Runster,2016-04-14 18:43:00,04/14/2016 06:43:00 PM,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White
3,A665644,,2013-10-21 07:59:00,10/21/2013 07:59:00 AM,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico
4,A682524,Rio,2014-06-29 10:38:00,06/29/2014 10:38:00 AM,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray


In [5]:
# Check information on the dataframe
intakes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 124222 entries, 0 to 124221
Data columns (total 12 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   Animal ID         124222 non-null  object        
 1   Name              85158 non-null   object        
 2   DateTime          124222 non-null  datetime64[ns]
 3   MonthYear         124222 non-null  object        
 4   Found Location    124222 non-null  object        
 5   Intake Type       124222 non-null  object        
 6   Intake Condition  124222 non-null  object        
 7   Animal Type       124222 non-null  object        
 8   Sex upon Intake   124221 non-null  object        
 9   Age upon Intake   124222 non-null  object        
 10  Breed             124222 non-null  object        
 11  Color             124222 non-null  object        
dtypes: datetime64[ns](1), object(11)
memory usage: 11.4+ MB


Let's do some of the transformations we did last time: dropping the MonthYear column, and changing column names to be lowercase without spaces.

In [6]:
# Drop MonthYear
intakes = intakes.drop(columns='MonthYear')

In [7]:
# Rename columns
intakes = intakes.rename(columns=lambda x: x.replace(" ","_").lower())

In [8]:
# Sanity check
intakes.head()

Unnamed: 0,animal_id,name,datetime,found_location,intake_type,intake_condition,animal_type,sex_upon_intake,age_upon_intake,breed,color
0,A786884,*Brock,2019-01-03 16:19:00,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White
3,A665644,,2013-10-21 07:59:00,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico
4,A682524,Rio,2014-06-29 10:38:00,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray


## Dealing with Dirty Data

It is a fact of the data science life - you will always be surrounded by 'dirty' data. What does it mean for data to be 'dirty'? What are some of the various ways that data can be 'dirty'?

- 


In [9]:
# Check for null values recognized by pandas as blank
intakes.isna().sum()

animal_id               0
name                39064
datetime                0
found_location          0
intake_type             0
intake_condition        0
animal_type             0
sex_upon_intake         1
age_upon_intake         0
breed                   0
color                   0
dtype: int64

There is no one way to deal with null values. What are some of the strategies we can use to deal with them?

- fill nulls with something that shows the value is missing ('unknown', 0)
- fill nulls with average or median
- drop those rows/columns


How, in Pandas, can we fill null values recognized by Pandas as null? Let's practice by filling nulls for the Name column with some placeholder value, like 'No name'.

Helpful link: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

In [10]:
# Code here to fill nulls in the Name column
intakes['name'] = intakes['name'].fillna(value='No Name')

Now let's check for nulls again...

In [11]:
# Sanity check
intakes.isna().sum()

animal_id           0
name                0
datetime            0
found_location      0
intake_type         0
intake_condition    0
animal_type         0
sex_upon_intake     1
age_upon_intake     0
breed               0
color               0
dtype: int64

Let's try a different strategy for the one lonely null in the 'Sex upon Intake' column - let's just drop that row, since it's only one observation.

Helpful link: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

In [12]:
len(intakes)

124222

In [13]:
# Code here to drop the whole row where Sex upon Intake is null
intakes = intakes.dropna(subset=['sex_upon_intake'])

In [14]:
# Copy/paste code from above to re-check for nulls
intakes.isna().sum()

animal_id           0
name                0
datetime            0
found_location      0
intake_type         0
intake_condition    0
animal_type         0
sex_upon_intake     0
age_upon_intake     0
breed               0
color               0
dtype: int64

How do we find sneaky null or nonsense values that aren't marked by Pandas as null?

In [15]:
# Run this cell without changes
intakes["age_upon_intake"].value_counts()

1 year       21809
2 years      19069
1 month      11910
3 years       7456
2 months      6735
4 years       4458
4 weeks       4414
5 years       4064
3 weeks       3620
3 months      3279
4 months      3203
5 months      3073
6 years       2717
2 weeks       2498
6 months      2396
7 years       2335
8 years       2284
7 months      1860
10 years      1828
9 months      1824
8 months      1500
9 years       1331
10 months     1028
1 week        1022
1 weeks        888
12 years       880
11 months      795
0 years        754
11 years       746
1 day          635
3 days         578
13 years       575
2 days         479
14 years       383
15 years       336
4 days         328
5 weeks        315
6 days         305
5 days         180
16 years       140
17 years        82
18 years        47
19 years        27
20 years        19
-1 years         5
22 years         5
25 years         1
-2 years         1
21 years         1
23 years         1
-3 years         1
24 years         1
Name: age_up

In [16]:
intakes['age_upon_intake'].unique()

array(['2 years', '8 years', '11 months', '4 weeks', '4 years', '6 years',
       '5 months', '14 years', '1 month', '2 months', '18 years',
       '4 months', '1 year', '6 months', '3 years', '4 days', '1 day',
       '5 years', '2 weeks', '15 years', '7 years', '3 weeks', '3 months',
       '12 years', '1 week', '9 months', '10 years', '10 months',
       '7 months', '9 years', '8 months', '1 weeks', '5 days', '2 days',
       '11 years', '0 years', '17 years', '3 days', '13 years', '5 weeks',
       '19 years', '6 days', '16 years', '20 years', '-1 years',
       '22 years', '23 years', '-2 years', '21 years', '-3 years',
       '25 years', '24 years'], dtype=object)

Analyze the values you're finding in the 'Age upon Intake' column. What doesn't quite fit here?

**Note:** using `.value_counts()` is just one way to look at the values of a column. In this case, it works because we can see which values are the most common, and it's verbose enough to show even the less common values that might be problematic.

So - how do we want to deal with the data in here that doesn't make sense?

- 


What if our goal is creating a column with a common standard for age, one which we could sort to see which animals are the oldest or youngest?

First, let's see what that would look like if we try it as the column is now:

In [17]:
# Run this cell without changes
intakes['age_upon_intake'].sort_values(ascending=True).unique()

array(['-1 years', '-2 years', '-3 years', '0 years', '1 day', '1 month',
       '1 week', '1 weeks', '1 year', '10 months', '10 years',
       '11 months', '11 years', '12 years', '13 years', '14 years',
       '15 years', '16 years', '17 years', '18 years', '19 years',
       '2 days', '2 months', '2 weeks', '2 years', '20 years', '21 years',
       '22 years', '23 years', '24 years', '25 years', '3 days',
       '3 months', '3 weeks', '3 years', '4 days', '4 months', '4 weeks',
       '4 years', '5 days', '5 months', '5 weeks', '5 years', '6 days',
       '6 months', '6 years', '7 months', '7 years', '8 months',
       '8 years', '9 months', '9 years'], dtype=object)

Let's unpack what is happening in that line of code - I take the column 'Age upon Intake' by itself (as a series), then sort the values from lowest to highest (`ascending=True`), then grab only unique results so we can see how it ordered the values without looking through all 115,088.

Does that do what we want it to? Let's discuss how this worked - how did it sort?

- 


To make our problem a bit easier, without dealing with the different ways that age is broken out, let's only look at animals where the age is given in years. How can we do that?

In [18]:
# Code here to grab only the animals where age is given in years


In [19]:
# Check the shape of this subset dataframe


In [20]:
# Sanity check


Can we grab only the number of years from this? Let's make a new column where we can put this data.

In [21]:
# Code here to make a new column, 'Age in Years'



# Did you get a 'SettingWithCopyWarning'? No worries - let's discuss

In [22]:
# Code here to transform that column to an integer


In [23]:
# Code here to check your work


In [24]:
# Code here to check some statistics on our now-numeric column


In [25]:
# Code here to check the unique values - in order!


In [26]:
# Let's check the mean for our now-numeric column


In [27]:
# Now let's check the median


Let's discuss this column - what does it mean that the mean and median are different? How will that change if we remove some of the nonsense numbers?

- 


In [28]:
# Code here to deal with those nonsense numbers
nonsense_years = ['-3 years', '-2 years', '-1 years']
intakes['age_upon_intake'] = intakes['age_upon_intake'].replace(nonsense_years, '0 years')

In [29]:
# Sanity check


In [30]:
# Code here to re-check your mean/median values


### Duplicates - another kind of dirty data (sometimes)

Some duplicates are legitimate, some are not - let's explore and discuss!

Let's go back to our full intakes dataframe

In [31]:
# Check for duplicates


In [32]:
# Now check specifically for Animal IDs that are duplicated


In [33]:
# Handle duplicates - only take the 1st intake for each animal
# Save it as a new version, named clean_intakes
clean_intakes = intakes.drop_duplicates(subset=['animal_id'], keep='last')

## Group By

We can use a `groupby` function to find out interesting patterns among groups in our data. Let's use one now to find the average age of each animal type in years.

In [34]:
# Run just a groupby on the animal_type column - what's the output?


In [35]:
# Add an aggregation function


## Merging Dataframes

We were given two data sources here - both an Intakes and an Outcomes CSV. Let's merge them!

![Merge diagram from Data Science Made Simple](http://www.datasciencemadesimple.com/wp-content/uploads/2017/09/join-or-merge-in-python-pandas-1.png)

[Image from Data Science Made Simple's post on Joining/Merging Pandas Data Frames](http://www.datasciencemadesimple.com/join-merge-data-frames-pandas-python/)

In [36]:
# Read in our outcomes csv as a dataframe named outcomes


In [37]:
# Check out our outcomes data


What column should we use to merge these DataFrames?

- 


Let's do some quick cleaning on our outcomes dataframe...

In [38]:
# Change the 'DateTime' column here to be recognized as datetime objects


In [39]:
# Change column names to be lower case and remove spaces


In [40]:
# Drop duplicate animal IDs, keeping only the 1st
# Save this as clean_outcomes
clean_outcomes = None

In [41]:
# Sanity check


Now... let's merge!

In [42]:
# Code here to merge dataframes


In [43]:
# Code here to check out the details of our new dataframe


Let's discuss - can anyone guess why I had us remove duplicates before this merge? What would happen if I didn't? How could we make our combined_df better?

- 


## Level Up!

1. Find the **age in days** for all animals, not just the ones whose age is provided in years. Be sure to do this on the original dataframe, not just on subsets of the dataframe.

   - (Assume a year is 365 days, and a month is 30 days)

        
2. Ask a few questions of the combined dataframe that you couldn't figure out by just looking at the intakes or outcomes dataframes by themselves.

   - Example: Can you find out how long each animal in the combined dataframe has been in the shelter? 
        
       - Hint: Check out Date Time objects - a new data type that isn't a string or an integer, but which Pandas can recognize as time! https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

In [44]:
# Code here to work on level up #1
clean_intakes['age_upon_intake']

0           2 years
1           8 years
2         11 months
3           4 weeks
4           4 years
            ...    
124217      2 years
124218       1 year
124219      4 years
124220      3 years
124221      2 years
Name: age_upon_intake, Length: 111012, dtype: object

In [45]:
clean_intakes.head()

Unnamed: 0,animal_id,name,datetime,found_location,intake_type,intake_condition,animal_type,sex_upon_intake,age_upon_intake,breed,color
0,A786884,*Brock,2019-01-03 16:19:00,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White
3,A665644,No Name,2013-10-21 07:59:00,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico
4,A682524,Rio,2014-06-29 10:38:00,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray


In [46]:
clean_intakes['age_upon_intake'].map(lambda x: x.split(" "))

0           [2, years]
1           [8, years]
2         [11, months]
3           [4, weeks]
4           [4, years]
              ...     
124217      [2, years]
124218       [1, year]
124219      [4, years]
124220      [3, years]
124221      [2, years]
Name: age_upon_intake, Length: 111012, dtype: object

In [47]:
clean_intakes['age_upon_intake'].map(lambda x: x.split(" "))

0           [2, years]
1           [8, years]
2         [11, months]
3           [4, weeks]
4           [4, years]
              ...     
124217      [2, years]
124218       [1, year]
124219      [4, years]
124220      [3, years]
124221      [2, years]
Name: age_upon_intake, Length: 111012, dtype: object

In [None]:
clean_intakes['age_upon_intake'].map(lambda x: int(x[0])*365 if ("year" in x.split(" ")[1])
                                     else (int(x[0])*30 if ('month') in x.split(" ")[1]
                                     else x[0]))

In [48]:
def find_days(series):
    return series.map(lambda x: int(x[0])*365 if ("year" in x.split(" ")[1])
               else (int(x[0])*30 if ('month') in x.split(" ")[1]
                    else x[0]))
    

clean_intakes['age_days'] = find_days(clean_intakes['age_upon_intake'])
clean_intakes.head(50)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  clean_intakes['age_days'] = find_days(clean_intakes['age_upon_intake'])


Unnamed: 0,animal_id,name,datetime,found_location,intake_type,intake_condition,animal_type,sex_upon_intake,age_upon_intake,breed,color,age_days
0,A786884,*Brock,2019-01-03 16:19:00,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor,730
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,2920
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,30
3,A665644,No Name,2013-10-21 07:59:00,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico,4
4,A682524,Rio,2014-06-29 10:38:00,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray,1460
5,A743852,Odin,2017-02-18 12:46:00,Austin (TX),Owner Surrender,Normal,Dog,Neutered Male,2 years,Labrador Retriever Mix,Chocolate,730
6,A635072,Beowulf,2019-04-16 09:53:00,415 East Mary Street in Austin (TX),Public Assist,Normal,Dog,Neutered Male,6 years,Great Dane Mix,Black,2190
7,A708452,Mumble,2015-07-30 14:37:00,Austin (TX),Public Assist,Normal,Dog,Intact Male,2 years,Labrador Retriever Mix,Black/White,730
8,A818975,No Name,2020-06-18 14:53:00,Braker Lane And Metric in Travis (TX),Stray,Normal,Cat,Intact Male,4 weeks,Domestic Shorthair,Cream Tabby,4
9,A774147,No Name,2018-06-11 07:45:00,6600 Elm Creek in Austin (TX),Stray,Injured,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Black/White,4


In [49]:
def find_days(series):
    for x in series:
        if 'year' in x[1]:
            days = int(x[0]) * 365
        elif 'month' in x[1]:
            days = int(x[0]) * 30
        else:
            days = int(x[0])
    return days

clean_intakes['age_days'] = find_days(clean_intakes['age_upon_intake'])
clean_intakes.tail(50)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  clean_intakes['age_days'] = find_days(clean_intakes['age_upon_intake'])


Unnamed: 0,animal_id,name,datetime,found_location,intake_type,intake_condition,animal_type,sex_upon_intake,age_upon_intake,breed,color,age_days
124170,A830109,Chopper,2021-03-07 09:00:00,Austin (TX),Public Assist,Normal,Dog,Intact Male,2 years,Pit Bull,Black/White,2
124171,A830329,No Name,2021-03-07 15:29:00,Fm 812 in Travis (TX),Stray,Sick,Dog,Intact Female,7 months,Chihuahua Shorthair,Blue,2
124172,A830157,*Newt,2021-03-03 13:48:00,13425 Fm 620 in Austin (TX),Stray,Injured,Cat,Intact Male,4 months,Domestic Shorthair Mix,Black/White,2
124173,A773315,Tofu,2021-03-08 15:10:00,5100 Caswell in Austin (TX),Stray,Injured,Dog,Neutered Male,3 years,Pit Bull Mix,White/Brown,2
124174,A830362,No Name,2021-03-08 15:53:00,1400 Brighton Circle in Austin (TX),Stray,Medical,Dog,Intact Male,5 years,American Pit Bull Terrier/Basset Hound,Blue/White,2
124176,A830379,No Name,2021-03-08 15:39:00,49Th And Airport in Austin (TX),Stray,Normal,Cat,Intact Female,2 years,Domestic Shorthair,Brown Tabby/White,2
124177,A804495,Peewee,2021-03-08 15:50:00,1127 Chicon Street in Austin (TX),Stray,Normal,Dog,Intact Male,3 years,Chihuahua Shorthair/Cairn Terrier,Black/Tan,2
124178,A830381,No Name,2021-03-08 15:41:00,8330 Linden Road in Travis (TX),Stray,Normal,Dog,Intact Male,2 years,Treeing Walker Coonhound/Great Pyrenees,Blue Tick,2
124179,A830384,No Name,2021-03-08 15:56:00,Austin (TX),Owner Surrender,Normal,Cat,Spayed Female,2 years,Domestic Shorthair,Torbie,2
124180,A828280,Reggie,2021-03-08 16:59:00,Austin (TX),Owner Surrender,Normal,Dog,Neutered Male,4 years,Labrador Retriever Mix,Yellow/White,2


In [50]:
pre_norm_age = clean_intakes['age_upon_intake'].map(lambda x: x.split(" "))
pre_norm_age

0           [2, years]
1           [8, years]
2         [11, months]
3           [4, weeks]
4           [4, years]
              ...     
124217      [2, years]
124218       [1, year]
124219      [4, years]
124220      [3, years]
124221      [2, years]
Name: age_upon_intake, Length: 111012, dtype: object

In [51]:
clean_intakes['age_value'] = pre_norm_age.map(lambda x: int(x[0]))
clean_intakes.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  clean_intakes['age_value'] = pre_norm_age.map(lambda x: int(x[0]))


Unnamed: 0,animal_id,name,datetime,found_location,intake_type,intake_condition,animal_type,sex_upon_intake,age_upon_intake,breed,color,age_days,age_value
0,A786884,*Brock,2019-01-03 16:19:00,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor,2,2
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,2,8
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,2,11
3,A665644,No Name,2013-10-21 07:59:00,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico,2,4
4,A682524,Rio,2014-06-29 10:38:00,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray,2,4


In [52]:
clean_intakes['age_time'] = pre_norm_age.map(lambda x: x[1])
clean_intakes.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  clean_intakes['age_time'] = pre_norm_age.map(lambda x: x[1])


Unnamed: 0,animal_id,name,datetime,found_location,intake_type,intake_condition,animal_type,sex_upon_intake,age_upon_intake,breed,color,age_days,age_value,age_time
0,A786884,*Brock,2019-01-03 16:19:00,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor,2,2,years
1,A706918,Belle,2015-07-05 12:59:00,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver,2,8,years
2,A724273,Runster,2016-04-14 18:43:00,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White,2,11,months
3,A665644,No Name,2013-10-21 07:59:00,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico,2,4,weeks
4,A682524,Rio,2014-06-29 10:38:00,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray,2,4,years


In [53]:
clean_intakes['age_days'] = clean_intakes['age_value'].map(lambda x: x * 
                                                           (365 if ('year' | 'years') in clean_intakes['age_time']
                                                            else ((30 if ('month' | 'months') in clean_intakes['age_time'])
                                                                  else 1)))

SyntaxError: invalid syntax (<ipython-input-53-2c14dc0c2331>, line 3)

In [None]:
for b in a:
    if b[1] == 'years':
        a = int(b[0]) * 365
    elif b[1] == 'months':
        a = int(b[0]) * 30
    else:
        a = int(b[0])

In [None]:
pre_norm_age.map(lambda x: (int(x[0]) * 365) if x[1] == 'years'
                  else (int(x[0]) * 30 if x[1] == 'months'
                        else (int(x))))

In [None]:
# Code here to work on level up #2
