Lambda School Data Science

*Unit 3, Med Cabinet Build*

---

In [90]:
import pandas as pd


pd.set_option('display.max_rows', 500)
df = pd.read_csv('cannabis.csv')

In [91]:
# Overview of the data

df.head()

Unnamed: 0,Strain,Type,Rating,Effects,Flavor,Description
0,100-Og,hybrid,4.0,"Creative,Energetic,Tingly,Euphoric,Relaxed","Earthy,Sweet,Citrus",$100 OG is a 50/50 hybrid strain that packs a ...
1,98-White-Widow,hybrid,4.7,"Relaxed,Aroused,Creative,Happy,Energetic","Flowery,Violet,Diesel",The ‘98 Aloha White Widow is an especially pot...
2,1024,sativa,4.4,"Uplifted,Happy,Relaxed,Energetic,Creative","Spicy/Herbal,Sage,Woody",1024 is a sativa-dominant hybrid bred in Spain...
3,13-Dawgs,hybrid,4.2,"Tingly,Creative,Hungry,Relaxed,Uplifted","Apricot,Citrus,Grapefruit",13 Dawgs is a hybrid of G13 and Chemdawg genet...
4,24K-Gold,hybrid,4.6,"Happy,Relaxed,Euphoric,Uplifted,Talkative","Citrus,Earthy,Orange","Also known as Kosher Tangie, 24k Gold is a 60%..."


## Step 1 - Effects Search Preparation

We need to break out the effects for each strain into something searchable. To do this we're going to grab the number of unique entries, this way we can know how any new columns we'll have for encoding.

In [92]:
# This is what one row looks like on efects

df['Effects'][0]

'Creative,Energetic,Tingly,Euphoric,Relaxed'

In [93]:
# Python considers this a string

type(df['Effects'][0])

str

In [94]:
# Pandas has an option to turn strings in a series into lists through the split method.
# Since methods run across the whole series we need to tell it to focus on the strings for each and split that,
# otherwise it thinks we're trying to split the series, which makes no sense.

df['Effects_List'] = df['Effects'].str.split(',')
df['Effects_List']

0       [Creative, Energetic, Tingly, Euphoric, Relaxed]
1         [Relaxed, Aroused, Creative, Happy, Energetic]
2        [Uplifted, Happy, Relaxed, Energetic, Creative]
3          [Tingly, Creative, Hungry, Relaxed, Uplifted]
4        [Happy, Relaxed, Euphoric, Uplifted, Talkative]
                              ...                       
2346     [Happy, Uplifted, Relaxed, Euphoric, Energetic]
2347        [Relaxed, Happy, Euphoric, Uplifted, Sleepy]
2348       [Relaxed, Sleepy, Talkative, Euphoric, Happy]
2349          [Relaxed, Sleepy, Euphoric, Happy, Hungry]
2350          [Hungry, Relaxed, Uplifted, Happy, Sleepy]
Name: Effects_List, Length: 2351, dtype: object

In [95]:
# Now python sees the field as a list.

type(df['Effects_List'][0])

list

In [84]:
df['Effects_List'][0]

['Creative', 'Energetic', 'Tingly', 'Euphoric', 'Relaxed']

In [85]:
# From here we can see that while some are below 5, none surpass it.
# We might need to do something about that, but for now we can ignore it.

df['Effects_List'].str.len()

0       5
1       5
2       5
3       5
4       5
       ..
2346    5
2347    5
2348    5
2349    5
2350    5
Name: Effects_List, Length: 2351, dtype: int64

In [86]:
# So we can see we have 15 unique values. Now we need to encode this.

print(len(df['Effects_List'].apply(pd.Series).stack().value_counts()))
df['Effects_List'].apply(pd.Series).stack().value_counts()

15


Happy        1871
Relaxed      1726
Euphoric     1635
Uplifted     1507
Creative      747
Sleepy        738
Energetic     646
Focused       595
Hungry        479
Talkative     360
Tingly        346
Giggly        298
Aroused       199
None           87
Dry             1
dtype: int64

## Step 2 - Encoding

In [89]:
# These two do the same thing, but they don't work with lists.
# They also only work with EXACT MATCHES.

df.loc[df['Effects'] == 'Creative']

df.loc[df['Effects'].isin(['Creative'])]

Unnamed: 0,Strain,Type,Rating,Effects,Flavor,Description,Effects_List
355,Blukashima,hybrid,5.0,Creative,,Using a Chernobyl male plant to pollenate thei...,[Creative]
369,Brain-Candy,hybrid,5.0,Creative,Sweet,Brain Candy by Insanity Strains is a handy hyb...,[Creative]


### MultiLabelBinarizer

In [99]:
from sklearn.preprocessing import MultiLabelBinarizer


mlb = MultiLabelBinarizer()

print(
pd.DataFrame(mlb.fit_transform(df['Effects_List'])
            ,columns = mlb.classes_
            ,index   = df.index
            )
     )

      Aroused  Creative  Dry  Energetic  Euphoric  Focused  Giggly  Happy  \
0           0         1    0          1         1        0       0      0   
1           1         1    0          1         0        0       0      1   
2           0         1    0          1         0        0       0      1   
3           0         1    0          0         0        0       0      0   
4           0         0    0          0         1        0       0      1   
...       ...       ...  ...        ...       ...      ...     ...    ...   
2346        0         0    0          1         1        0       0      1   
2347        0         0    0          0         1        0       0      1   
2348        0         0    0          0         1        0       0      1   
2349        0         0    0          0         1        0       0      1   
2350        0         0    0          0         0        0       0      1   

      Hungry  None  Relaxed  Sleepy  Talkative  Tingly  Uplifted  
0       

In [104]:
df2 = pd.DataFrame(mlb.fit_transform(df['Effects_List'])
                  ,columns = mlb.classes_
                  ,index   = df.index
                  )

df2

Unnamed: 0,Aroused,Creative,Dry,Energetic,Euphoric,Focused,Giggly,Happy,Hungry,None,Relaxed,Sleepy,Talkative,Tingly,Uplifted
0,0,1,0,1,1,0,0,0,0,0,1,0,0,1,0
1,1,1,0,1,0,0,0,1,0,0,1,0,0,0,0
2,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1
3,0,1,0,0,0,0,0,0,1,0,1,0,0,1,1
4,0,0,0,0,1,0,0,1,0,0,1,0,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2346,0,0,0,1,1,0,0,1,0,0,1,0,0,0,1
2347,0,0,0,0,1,0,0,1,0,0,1,1,0,0,1
2348,0,0,0,0,1,0,0,1,0,0,1,1,1,0,0
2349,0,0,0,0,1,0,0,1,1,0,1,1,0,0,0


In [107]:
df

Unnamed: 0,Strain,Type,Rating,Effects,Flavor,Description,Effects_List
0,100-Og,hybrid,4.0,"Creative,Energetic,Tingly,Euphoric,Relaxed","Earthy,Sweet,Citrus",$100 OG is a 50/50 hybrid strain that packs a ...,"[Creative, Energetic, Tingly, Euphoric, Relaxed]"
1,98-White-Widow,hybrid,4.7,"Relaxed,Aroused,Creative,Happy,Energetic","Flowery,Violet,Diesel",The ‘98 Aloha White Widow is an especially pot...,"[Relaxed, Aroused, Creative, Happy, Energetic]"
2,1024,sativa,4.4,"Uplifted,Happy,Relaxed,Energetic,Creative","Spicy/Herbal,Sage,Woody",1024 is a sativa-dominant hybrid bred in Spain...,"[Uplifted, Happy, Relaxed, Energetic, Creative]"
3,13-Dawgs,hybrid,4.2,"Tingly,Creative,Hungry,Relaxed,Uplifted","Apricot,Citrus,Grapefruit",13 Dawgs is a hybrid of G13 and Chemdawg genet...,"[Tingly, Creative, Hungry, Relaxed, Uplifted]"
4,24K-Gold,hybrid,4.6,"Happy,Relaxed,Euphoric,Uplifted,Talkative","Citrus,Earthy,Orange","Also known as Kosher Tangie, 24k Gold is a 60%...","[Happy, Relaxed, Euphoric, Uplifted, Talkative]"
...,...,...,...,...,...,...,...
2346,Zeus-Og,hybrid,4.7,"Happy,Uplifted,Relaxed,Euphoric,Energetic","Earthy,Woody,Pine",Zeus OG is a hybrid cross between Pineapple OG...,"[Happy, Uplifted, Relaxed, Euphoric, Energetic]"
2347,Zkittlez,indica,4.6,"Relaxed,Happy,Euphoric,Uplifted,Sleepy","Sweet,Berry,Grape",Zkittlez is an indica-dominant mix of Grape Ap...,"[Relaxed, Happy, Euphoric, Uplifted, Sleepy]"
2348,Zombie-Kush,indica,5.0,"Relaxed,Sleepy,Talkative,Euphoric,Happy","Earthy,Sweet,Spicy/Herbal",Zombie Kush by Ripper Seeds comes from two dif...,"[Relaxed, Sleepy, Talkative, Euphoric, Happy]"
2349,Zombie-Og,indica,4.4,"Relaxed,Sleepy,Euphoric,Happy,Hungry","Sweet,Earthy,Pungent",If you’re looking to transform into a flesh-ea...,"[Relaxed, Sleepy, Euphoric, Happy, Hungry]"


In [106]:
df.merge(df2, left_index=True)

MergeError: Must pass right_on or right_index=True