## Intermediate Data Science

#### University of Redlands - DATA 201
#### Prof: Joanna Bieri [joanna_bieri@redlands.edu](mailto:joanna_bieri@redlands.edu)
#### [Class Website: data201.joannabieri.com](https://joannabieri.com/data201_intermediate.html)

## Computer Set Up

In [3]:
!python --version

Python 3.12.2


In [4]:
!conda --version

conda 24.11.3


In [5]:
!git --version

git version 2.45.1


**Clone the Repo for our Class**

<a href="https://github.com/Redlands-DATA201/FALL25" target="_blank">Redlands-DATA201/FALL2025</a>

In [None]:
# # Depending on your setup you might need to install modules
!conda install -y numpy
!conda install -y pandas
!conda install -y matplotlib
!conda install -y plotly

In [1]:
# Some basic package imports
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio
pio.renderers.defaule = 'colab'

# Review of DATA 101

For our review we are going to jump in and do some Exploratory Data Analysis (EDA) on a data set that we have seen before. This time you are just given a .csv file and your goal is to analyze it - answering these questions.

**I expect this to feel impossible at first!!!** but if you all work together, share code, and look things up as needed, I know you can do it!

#### This was a Homework Assignment in Data 101 - but see how much you can figure out on your own!

* Answer the following questions using reproducible Python code.
    - What does it mean to be reproducible? This means someone else at any time in the future can run and understand your code almost like reading a blog post.
* For each question, state your answer in a sentence, e.g. "In this sample, the first three common names of purchasers are ...".
* Note that the answers to all questions are within the context of this particular sample of sales, i.e. you shouldn't make inferences about the population of all Lego sales based on this sample.

0.  Read in the .csv file using Pandas and display it

1.  Describe what you see in the data set (variables, observations, etc)

2.  What are the three most common first names of purchasers?

3.  What are the three most common themes of Lego sets purchased?

4.  Among the most common theme of Lego sets purchased, what is the most common subtheme?

5.  Create data frames for each of the ages in the following categories: "18 and under", "19 - 25", "26 - 35", "36 - 50", "51 and over". HINT - use masks

6.  Which age group has purchased the highest number of Lego sets.

7.  Which age group has spent the most money on Legos?

8.  Which Lego theme has made the most money for Lego?

9.  Which area code has spent the most money on Legos? In the US the area code is the first 3 digits of a phone number. Then using a for loop calculate the average money spent per customer for each area code.

10.  Come up with a question you want to answer using these data, and write it down. Then, create a data visualization that answers the question, and explain how your visualization answers the question.

**This exersize comes from: https://datasciencebox.org/course-materials/hw-instructions/hw-05/hw-05-legos**

In [3]:
file_name = 'data/lego-sales.csv'
df = pd.read_csv(file_name)
display(df)

Unnamed: 0,first_name,last_name,age,phone_number,set_id,number,theme,subtheme,year,name,pieces,us_price,image_url,quantity
0,Kimberly,Beckstead,24,216-555-2549,24701,76062,DC Comics Super Heroes,Mighty Micros,2018,Robin vs. Bane,77.0,9.99,http://images.brickset.com/sets/images/76062-1...,1
1,Neel,Garvin,35,819-555-3189,25626,70595,Ninjago,Rise of the Villains,2018,Ultra Stealth Raider,1093.0,119.99,http://images.brickset.com/sets/images/70595-1...,1
2,Neel,Garvin,35,819-555-3189,24665,21031,Architecture,,2018,Burj Khalifa,333.0,39.99,http://images.brickset.com/sets/images/21031-1...,1
3,Chelsea,Bouchard,41,,24695,31048,Creator,,2018,Lakeside Lodge,368.0,29.99,http://images.brickset.com/sets/images/31048-1...,1
4,Chelsea,Bouchard,41,,25626,70595,Ninjago,Rise of the Villains,2018,Ultra Stealth Raider,1093.0,119.99,http://images.brickset.com/sets/images/70595-1...,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
615,Talise,Nieukirk,16,801-555-2343,24902,41556,Mixels,Series 7,2018,Tiketz,62.0,4.99,http://images.brickset.com/sets/images/41556-1...,2
616,Spencer,Morgan,28,784-555-3455,26041,41580,Mixels,Series 9,2018,Myke,63.0,4.99,,2
617,Spencer,Morgan,28,784-555-3455,26060,5005051,Gear,Digital Media,2018,Friends of Heartlake City Girlz 4 Life,,19.99,,1
618,Amelia,Hageman,40,336-555-1950,24702,76063,DC Comics Super Heroes,Mighty Micros,2018,The Flash vs. Captain Cold,88.0,9.99,http://images.brickset.com/sets/images/76063-1...,2


In [5]:
df.shape

(620, 14)

In [9]:
df.describe()

Unnamed: 0,age,set_id,year,pieces,us_price,quantity
count,620.0,620.0,620.0,551.0,620.0,620.0
mean,34.356452,25124.982258,2018.0,254.206897,29.041613,1.437097
std,11.276537,506.76072,0.0,357.738804,34.630623,0.712849
min,16.0,24548.0,2018.0,13.0,3.99,1.0
25%,25.0,24724.75,2018.0,70.0,9.99,1.0
50%,33.0,24804.5,2018.0,114.0,19.99,1.0
75%,41.0,25640.25,2018.0,313.0,29.99,2.0
max,68.0,26060.0,2018.0,4634.0,349.99,5.0


In [11]:
df['first_name'].value_counts()

first_name
Jackson     13
Jacob       11
Joseph      11
Michael     10
Kaitlyn      8
            ..
Erik         1
Kelly        1
Katelynn     1
Talise       1
Kimberly     1
Name: count, Length: 211, dtype: int64

In [92]:
theme_count = df['theme'].value_counts()
display(theme_count)

theme
Star Wars                  75
Nexo Knights               64
Mixels                     55
Gear                       55
City                       45
Friends                    42
Ninjago                    38
Duplo                      35
Bionicle                   34
Creator                    25
Elves                      22
DC Comics Super Heroes     22
Marvel Super Heroes        19
Dimensions                 18
Disney Princess            15
The Angry Birds Movie      11
Architecture               10
Technic                    10
Minecraft                   9
Advanced Models             4
Ghostbusters                3
Seasonal                    3
Collectable Minifigures     3
Ideas                       2
Classic                     1
Name: count, dtype: int64

In [15]:
lock_theme = theme_count.head(3).index
display(lock_theme)

Index(['Star Wars', 'Nexo Knights', 'Mixels'], dtype='object', name='theme')

In [112]:
#df[['theme','subtheme']].value_counts()
# Counts number of times each theme appears
theme_counts = df['theme'].value_counts()
display(theme_counts)
top_theme = theme_counts.index[0]
display(top_theme)
theme_rows = df[df['theme'] == top_theme]
display(theme_rows)
count_subtheme = theme_rows['subtheme'].value_counts()
count_subtheme
common_subtheme = count_subtheme.index[0]
common_subtheme

theme
Star Wars                  75
Nexo Knights               64
Mixels                     55
Gear                       55
City                       45
Friends                    42
Ninjago                    38
Duplo                      35
Bionicle                   34
Creator                    25
Elves                      22
DC Comics Super Heroes     22
Marvel Super Heroes        19
Dimensions                 18
Disney Princess            15
The Angry Birds Movie      11
Architecture               10
Technic                    10
Minecraft                   9
Advanced Models             4
Ghostbusters                3
Seasonal                    3
Collectable Minifigures     3
Ideas                       2
Classic                     1
Name: count, dtype: int64

'Star Wars'

Unnamed: 0,first_name,last_name,age,phone_number,set_id,number,theme,subtheme,year,name,pieces,us_price,image_url,quantity
6,Bryanna,Welsh,19,,24797,75138,Star Wars,Episode V,2018,Hoth Attack,233.0,24.99,http://images.brickset.com/sets/images/75138-1...,1
23,Amanda,Tronnier,45,317-555-7477,24959,75139,Star Wars,The Force Awakens,2018,Battle on Takodana,409.0,59.99,http://images.brickset.com/sets/images/75139-1...,1
28,Jacob,Nzabanita,31,339-555-2572,24793,75133,Star Wars,Battlefront,2018,Rebel Alliance Battle Pack,101.0,12.99,http://images.brickset.com/sets/images/75133-1...,2
30,Jacob,Nzabanita,31,339-555-2572,25920,75151,Star Wars,Episode III,2018,Clone Turbo Tank,903.0,109.99,http://images.brickset.com/sets/images/75151-1...,1
38,Riley,Ott,51,517-555-2093,24785,75126,Star Wars,MicroFighters,2018,First Order Snowspeeder,91.0,9.99,http://images.brickset.com/sets/images/75126-1...,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
596,Dustin,Vanvuuren,51,812-555-1009,25920,75151,Star Wars,Episode III,2018,Clone Turbo Tank,903.0,109.99,http://images.brickset.com/sets/images/75151-1...,1
600,Juana,Geisert,35,701-555-8100,25919,75150,Star Wars,Rebels,2018,Vader's TIE Advanced vs. A-wing Fighter,702.0,89.99,http://images.brickset.com/sets/images/75150-1...,2
603,Benjamin,Park,33,,24785,75126,Star Wars,MicroFighters,2018,First Order Snowspeeder,91.0,9.99,http://images.brickset.com/sets/images/75126-1...,1
604,Benjamin,Park,33,,25898,75145,Star Wars,Original Content,2018,Eclipse Fighter,363.0,29.99,http://images.brickset.com/sets/images/75145-1...,4


'The Force Awakens'

In [19]:
first_mask = df['age'] <= 18
mask = df[first_mask] 
mask

Unnamed: 0,first_name,last_name,age,phone_number,set_id,number,theme,subtheme,year,name,pieces,us_price,image_url,quantity
44,Michelle,Uguccioni,17,,24756,70310,Nexo Knights,,2018,Knighton Battle Blaster,76.0,9.99,http://images.brickset.com/sets/images/70310-1...,1
45,Michelle,Uguccioni,17,,24896,31046,Creator,,2018,Fast Car,222.0,19.99,http://images.brickset.com/sets/images/31046-1...,1
70,Lucas,Jimenez-Dominguez,18,712-555-0459,24691,31043,Creator,,2018,Chopper Transporter,124.0,9.99,http://images.brickset.com/sets/images/31043-1...,2
71,Lucas,Jimenez-Dominguez,18,712-555-0459,26034,41573,Mixels,Series 9,2018,Sweepz,61.0,4.99,,1
72,Lucas,Jimenez-Dominguez,18,712-555-0459,24704,41140,Disney Princess,Palace Pets,2018,Daisy's Beauty Salon,98.0,9.99,http://images.brickset.com/sets/images/41140-1...,1
73,Lucas,Jimenez-Dominguez,18,712-555-0459,24903,41558,Mixels,Series 7,2018,Mixadel,63.0,4.99,http://images.brickset.com/sets/images/41558-1...,2
74,Lucas,Jimenez-Dominguez,18,712-555-0459,26033,41572,Mixels,Series 9,2018,Gobbol,62.0,4.99,,1
187,Chayanne,Williams,17,869-555-6681,24678,60109,City,Fire,2018,Fire Boat,412.0,79.99,http://images.brickset.com/sets/images/60109-1...,2
188,Chayanne,Williams,17,869-555-6681,24697,76044,DC Comics Super Heroes,Batman v Superman: Dawn of Justice,2018,Clash of the Heroes,92.0,12.99,http://images.brickset.com/sets/images/76044-1...,1
223,Michael,Cruz,17,361-555-8212,24723,41172,Elves,,2018,The Water Dragon Adventure,212.0,19.99,http://images.brickset.com/sets/images/41172-1...,1


In [21]:
second_mask = (df['age'] >= 19) & (df['age'] <=25)
second_mask = df[second_mask]
second_mask

Unnamed: 0,first_name,last_name,age,phone_number,set_id,number,theme,subtheme,year,name,pieces,us_price,image_url,quantity
0,Kimberly,Beckstead,24,216-555-2549,24701,76062,DC Comics Super Heroes,Mighty Micros,2018,Robin vs. Bane,77.0,9.99,http://images.brickset.com/sets/images/76062-1...,1
6,Bryanna,Welsh,19,,24797,75138,Star Wars,Episode V,2018,Hoth Attack,233.0,24.99,http://images.brickset.com/sets/images/75138-1...,1
7,Bryanna,Welsh,19,,24701,76062,DC Comics Super Heroes,Mighty Micros,2018,Robin vs. Bane,77.0,9.99,http://images.brickset.com/sets/images/76062-1...,3
10,Chase,Fortenberry,19,205-555-3704,24707,10801,Duplo,,2018,Baby Animals,13.0,9.99,http://images.brickset.com/sets/images/10801-1...,1
11,Chase,Fortenberry,19,205-555-3704,24713,10809,Duplo,,2018,Police Patrol,15.0,14.99,http://images.brickset.com/sets/images/10809-1...,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
591,Paige,Ice,21,,26038,41577,Mixels,Series 9,2018,Mysto,64.0,4.99,,1
592,Aiden,Ganley,19,786-555-5067,25639,70324,Nexo Knights,,2018,Merlok's Library 2.0,288.0,24.99,http://images.brickset.com/sets/images/70324-1...,1
593,Aiden,Ganley,19,786-555-5067,24756,70310,Nexo Knights,,2018,Knighton Battle Blaster,76.0,9.99,http://images.brickset.com/sets/images/70310-1...,2
608,Carolyn,Quarry,23,567-555-7649,25627,70596,Ninjago,Rise of the Villains,2018,Samurai X Cave Chaos,1253.0,119.99,,2


In [23]:
third_mask = (df['age'] >= 26) & (df['age'] <= 35)
third_mask = df[third_mask]
third_mask

Unnamed: 0,first_name,last_name,age,phone_number,set_id,number,theme,subtheme,year,name,pieces,us_price,image_url,quantity
1,Neel,Garvin,35,819-555-3189,25626,70595,Ninjago,Rise of the Villains,2018,Ultra Stealth Raider,1093.0,119.99,http://images.brickset.com/sets/images/70595-1...,1
2,Neel,Garvin,35,819-555-3189,24665,21031,Architecture,,2018,Burj Khalifa,333.0,39.99,http://images.brickset.com/sets/images/21031-1...,1
27,Jacob,Nzabanita,31,339-555-2572,24732,41117,Friends,Pop Star,2018,Pop Star TV Studio,194.0,19.99,http://images.brickset.com/sets/images/41117-1...,1
28,Jacob,Nzabanita,31,339-555-2572,24793,75133,Star Wars,Battlefront,2018,Rebel Alliance Battle Pack,101.0,12.99,http://images.brickset.com/sets/images/75133-1...,2
29,Jacob,Nzabanita,31,339-555-2572,24723,41172,Elves,,2018,The Water Dragon Adventure,212.0,19.99,http://images.brickset.com/sets/images/41172-1...,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,Benjamin,Park,33,,24957,71309,Bionicle,Toa,2018,Onua - Uniter of Earth,143.0,19.99,http://images.brickset.com/sets/images/71309-1...,2
607,Miles,Hill,34,,24758,70312,Nexo Knights,,2018,Lance's Mecha Horse,237.0,19.99,http://images.brickset.com/sets/images/70312-1...,2
610,Jennifer,Reinert,29,402-555-0467,24660,71241,Dimensions,Fun Pack,2018,Fun Pack: Slimer,33.0,14.99,http://images.brickset.com/sets/images/71241-1...,3
616,Spencer,Morgan,28,784-555-3455,26041,41580,Mixels,Series 9,2018,Myke,63.0,4.99,,2


In [25]:
fourth_mask = (df['age'] >= 36) & (df['age'] <= 50)
fourth_mask = df[fourth_mask]
fourth_mask

Unnamed: 0,first_name,last_name,age,phone_number,set_id,number,theme,subtheme,year,name,pieces,us_price,image_url,quantity
3,Chelsea,Bouchard,41,,24695,31048,Creator,,2018,Lakeside Lodge,368.0,29.99,http://images.brickset.com/sets/images/31048-1...,1
4,Chelsea,Bouchard,41,,25626,70595,Ninjago,Rise of the Villains,2018,Ultra Stealth Raider,1093.0,119.99,http://images.brickset.com/sets/images/70595-1...,1
5,Chelsea,Bouchard,41,,24721,10831,Duplo,,2018,My First Caterpillar,19.0,9.99,http://images.brickset.com/sets/images/10831-1...,1
8,Caleb,Garcia-Wideman,37,907-555-9236,24730,41115,Friends,,2018,Emma's Creative Workshop,108.0,9.99,http://images.brickset.com/sets/images/41115-1...,1
9,Caleb,Garcia-Wideman,37,907-555-9236,25611,21127,Minecraft,Minifig-scale,2018,The Fortress,,109.99,http://images.brickset.com/sets/images/21127-1...,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
598,Payton,Milsap,42,,24702,76063,DC Comics Super Heroes,Mighty Micros,2018,The Flash vs. Captain Cold,88.0,9.99,http://images.brickset.com/sets/images/76063-1...,3
599,Payton,Milsap,42,,24678,60109,City,Fire,2018,Fire Boat,412.0,79.99,http://images.brickset.com/sets/images/60109-1...,2
602,Stephanie,Harrison,42,,24736,41121,Friends,Adventure Camp,2018,Adventure Camp Rafting,320.0,29.99,http://images.brickset.com/sets/images/41121-1...,1
618,Amelia,Hageman,40,336-555-1950,24702,76063,DC Comics Super Heroes,Mighty Micros,2018,The Flash vs. Captain Cold,88.0,9.99,http://images.brickset.com/sets/images/76063-1...,2


In [27]:
fifth_mask = (df['age'] >= 51)
fifth_mask = df[fifth_mask]
fifth_mask

Unnamed: 0,first_name,last_name,age,phone_number,set_id,number,theme,subtheme,year,name,pieces,us_price,image_url,quantity
31,Hannah,Drews Stunkel,55,339-555-6320,25628,70590,Ninjago,,2018,Airjitzu Battle Grounds,666.0,59.99,,1
32,Hannah,Drews Stunkel,55,339-555-6320,25624,70593,Ninjago,Skybound,2018,The Green NRG Dragon,567.0,49.99,http://images.brickset.com/sets/images/70593-1...,3
33,Hannah,Drews Stunkel,55,339-555-6320,24734,41119,Friends,,2018,Heartlake Cupcake Cafe,439.0,39.99,http://images.brickset.com/sets/images/41119-1...,1
38,Riley,Ott,51,517-555-2093,24785,75126,Star Wars,MicroFighters,2018,First Order Snowspeeder,91.0,9.99,http://images.brickset.com/sets/images/75126-1...,2
39,Riley,Ott,51,517-555-2093,24772,70604,Ninjago,Skybound,2018,Tiger Widow Island,450.0,49.99,http://images.brickset.com/sets/images/70604-1...,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
552,Angel,Payne,62,,25605,75146,Star Wars,Seasonal,2018,Star Wars Advent Calendar,282.0,39.99,http://images.brickset.com/sets/images/75146-1...,4
553,Angel,Payne,62,,24793,75133,Star Wars,Battlefront,2018,Rebel Alliance Battle Pack,101.0,12.99,http://images.brickset.com/sets/images/75133-1...,1
594,Dustin,Vanvuuren,51,812-555-1009,24709,10803,Duplo,,2018,Arctic,40.0,29.99,http://images.brickset.com/sets/images/10803-1...,2
595,Dustin,Vanvuuren,51,812-555-1009,24807,71235,Dimensions,Level Pack,2018,Level Pack: Midway Arcade,96.0,29.99,http://images.brickset.com/sets/images/71235-1...,1


In [29]:
df['age'].value_counts().sort_index() 

age
16    10
17     7
18    13
19    10
20    20
21    23
22    19
23    23
24    22
25    12
26    29
27    22
28    15
29     8
30    25
31    12
32    10
33    32
34    13
35    17
36    15
37    22
38    30
39    26
40    18
41    20
42    15
43    13
44    13
45     8
46     6
47     8
48     5
49     4
50    13
51     9
52     4
53     9
54     4
55     5
56     8
57     3
59     3
61     8
62     3
63     3
68     3
Name: count, dtype: int64

In [31]:
mask['quantity'].sum() 

45

In [33]:
second_mask['quantity'].sum() 

174

In [35]:
third_mask['quantity'].sum()

267

In [37]:
fourth_mask['quantity'].sum()

313

In [39]:
fifth_mask['quantity'].sum()

92

In [41]:
mask['us_price'].sum()

641.7

In [43]:
second_mask['us_price'].sum()

3629.710000000001

In [45]:
third_mask['us_price'].sum()

5260.169999999998

In [47]:
fourth_mask['us_price'].sum()

6641.84

In [49]:
fifth_mask['us_price'].sum()

1832.38

8.

In [51]:
price_mask = df.groupby('theme')['us_price'].sum()
price_mask

theme
Advanced Models             679.96
Architecture                389.90
Bionicle                    549.66
City                       1476.55
Classic                      29.99
Collectable Minifigures      11.97
Creator                     409.75
DC Comics Super Heroes      450.78
Dimensions                  369.82
Disney Princess             167.85
Duplo                       854.65
Elves                       809.78
Friends                     924.58
Gear                       1056.45
Ghostbusters                469.97
Ideas                       139.98
Marvel Super Heroes         539.81
Minecraft                  1439.91
Mixels                      274.45
Nexo Knights               1569.36
Ninjago                    1649.62
Seasonal                     29.97
Star Wars                  2842.25
Technic                     492.90
The Angry Birds Movie       375.89
Name: us_price, dtype: float64

9.

In [61]:
find_areacode = df.groupby('phone_number') ['us_price'].sum()
find_areacode

phone_number
205-555-3704     24.98
205-555-7084     49.98
205-555-7773     64.97
206-555-3697     24.99
209-555-6030     29.99
                 ...  
973-555-3236    219.98
973-555-5517    224.96
979-555-1121    159.98
980-555-0099     10.98
989-555-3671     29.98
Name: us_price, Length: 213, dtype: float64

In [88]:
two_columns = ['phone_number','us_price'] 
area = df[two_columns]
area

Unnamed: 0,phone_number,us_price
0,216-555-2549,9.99
1,819-555-3189,119.99
2,819-555-3189,39.99
3,,29.99
4,,119.99
...,...,...
615,801-555-2343,4.99
616,784-555-3455,4.99
617,784-555-3455,19.99
618,336-555-1950,9.99
