# University of Stirling

# ITNPBD2 Representing and Manipulating Data

# Assignment Spring 2024

# A Consultancy Job for JC Penney

This notebook forms the assignment instructions and submission document of the assignment for ITNPBD2. Read the instructions carefully and enter code into the cells as indicated.

You will need these five files, which were in the Zip file you downloaded from the course webpage:

- jcpenney_reviewers.json
- jcpenney_products.json
- products.csv
- reviews.csv
- users.csv

The data in these files describes products that have been sold by the American retail giant, JC Penney, and reviews by customers who bought them. Note that the product data is real, but the customer data is synthetic.

Your job is to process the data, as requested in the instructions in the markdown cells in this notebook.

# Completing the Assignment

Rename this file to be xxxxxx_BD2 where xxxxxx is your student number, then type your code and narrative description into the boxes provided. Add as many code and markdown cells as you need. The cells should contain:

- **Text narrative describing what you did with the data**
- **The code that performs the task you have described**
- **Comments that explain your code**

# Marking Scheme
The assessment will be marked against the university Common Marking Scheme (CMS)

Here is a summary of what you need to achieve to gain a grade in the major grade bands:

|Grade|Requirement|
|:---|:---|
| Fail | You will fail if your code does not run or does not achieve even the basics of the task. You may also fail if you submit code without either comments or a text explanation of what the code does.|
| Pass | To pass, you must submit sufficient working code to show that you have mastered the basics of the task, even if not everything works completely. You must include some justifications for your choice of methods, but without mentioning alternatives. |
| Merit | For a merit, your code must be mostly correct, with only small problems or parts missing, and your comments must be useful rather than simply re-stating the code in English. Most choices for methods and structures should be explained and alternatives mentioned. |
| Distinction | For a distinction, your code must be working, correct, and well commented and shows an appreciation of style, efficiency and reliability. All choices for methods and structures are concisely justified and alternatives are given well thought considerations. For a distinction, your work should be good enough to present to executives at the company.|

The full details of the CMS can be found here

https://www.stir.ac.uk/about/professional-services/student-academic-and-corporate-services/academic-registry/academic-policy-and-practice/quality-handbook/assessment-policy-and-procedure/appendix-2-postgraduate-common-marking-scheme/

Note that this means there are not certain numbers of marks allocated to each stage of the assignment. Your grade will reflect how well your solutions and comments demonstrate that you have achieved the learning outcomes of the task. 

## Submission
When you are ready to submit, **print** your notebook as PDF (go to File -> Print Preview) in the Jupyter menu. Make sure you have run all the cells and that their output is displayed. Any lines of code or comments that are not visible in the pdf should be broken across several lines. You can then submit the file online.

Late penalties will apply at a rate of three marks per day, up to a maximum of 7 days. After 7 days you will be given a mark of 0. Extensions will be considered under acceptable circumstances outside your control.

## Academic Integrity

This is an individual assignment, and so all submitted work must be fully your own work.

The University of Stirling is committed to protecting the quality and standards of its awards. Consequently, the University seeks to promote and nurture academic integrity, support staff academic integrity, and support students to understand and develop good academic skills that facilitate academic integrity.

In addition, the University deals decisively with all forms of Academic Misconduct.

Where a student does not act with academic integrity, their work or behaviour may demonstrate Poor Academic Practice or it may represent Academic Misconduct.

### Poor Academic Practice

Poor Academic Practice is defined as: "The submission of any type of assessment with a lack of referencing or inadequate referencing which does not effectively acknowledge the origin of words, ideas, images, tables, diagrams, maps, code, sound and any other sources used in the assessment."

### Academic Misconduct

Academic Misconduct is defined as: "any act or attempted act that does not demonstrate academic integrity and that may result in creating an unfair academic advantage for you or another person, or an academic disadvantage for any other member or member of the academic community."

Plagiarism is presenting somebody else’s work as your own **and includes the use of artificial intelligence tools such as GPT or CoPilot**. Plagiarism is a form of academic misconduct and is taken very seriously by the University. Students found to have plagiarised work can have marks deducted and, in serious cases, even be expelled from the University. Do not submit any work that is not entirely your own. Do not collaborate with or get help from anybody else with this assignment.

The University of Stirling's full policy on Academic Integrity can be found at:

https://www.stir.ac.uk/about/professional-services/student-academic-and-corporate-services/academic-registry/academic-policy-and-practice/quality-handbook/academic-integrity-policy-and-academic-misconduct-procedure/

## The Assignment
Your task with this assignment is to use the data provided to demonstrate your Python data manipulation skills.

There are three `.csv` files and two `.json` files so you can process different types of data. The files also contain unstructured data in the form of natural language in English and links to images that you can access from the JC Penney website (use the field called `product_image_urls`).

Start with easy tasks to show you can read in a file, create some variables and data structures, and manipulate their contents. Then move onto something more interesting.

Look at the data that we provided with this assessment and think of something interesting to do with it using whatever libraries you like. Describe what you decide to do with the data and why it might be interesting or useful to the company to do it.

You can add additional data if you need to - either download it or access it using `requests`. Produce working code to implement your ideas in as many cells as you need below. There is no single right answer, the aim is to simply show you are competent in using python for data analysis. Exactly how you do that is up to you.

For a distinction class grade, this must show originality, creative thinking, and insights beyond what you've been taught directly on the module.

## Structure
You may structure the project how you wish, but here is a suggested guideline to help you organise your work:

 1. Data Exploration - Explore the data and show you understand its structure and relations
 2. Data Validation - Check the quality of the data. Is it complete? Are there obvious errors?
 3. Data Visualisation - Gain an overall understanding of the data with visualisations
 4. Data Analysis = Set some questions and use the data to answer them
 5. Data Augmentation - Add new data from another source to bring new insights to the data you already have

# Remember to make sure you are working completely on your own.
# Don't work in a group or with a friend
You may NOT use any automated code generation or analytics tools for this assignment, so do not use tools like GPT. You can look up the syntax for the functions you use, but you must write the code yourself and the comments must provide an insightful analysis of the results.



# Assignment Start

## Data Exploration

- We begin by loading the data to see how it is organized. 
- We will do this by creating a dataframe for each dataset using pandas

In [1]:
import pandas as pd

- For each dataset, we set an index and display a random set of 10 objects
- We display the data type of the columns
- We the `df.describe()` method to for summary statistics of the numerical columns

### jcpenny_products.json

In [2]:
jcpenney_products = pd.read_json('jcpenney_products.json', lines=True)
jcpenney_products = jcpenney_products.set_index('uniq_id')
jcpenney_products.sample(10)

Unnamed: 0_level_0,sku,name_title,description,list_price,sale_price,category,category_tree,average_product_rating,product_url,product_image_urls,brand,total_number_reviews,Reviews,Bought With
uniq_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
e17fc1a2ad324952b5068085b45787ce,pp5005730205,Disney Collection Olaf Costume - Kids 2-10,"This soft, heart-melting Olaf costume surround...",48.34,22.96,,,2.0,http://www.jcpenney.com/disney-collection-olaf...,http://s7d9.scene7.com/is/image/JCPenney/DP062...,DISNEY,2,"[{'User': 'txhh1124', 'Review': 'This costume ...","[6bd5c9683651317cbeb14b24af906ccd, baf5ab9e958..."
bc6ea9637f34ddf623c913a4cbccecd0,pp5006243233,Knit Works® 3/4-Sleeve Top and Scarf Set - Gir...,"An easy and cute outfit combination, this top ...",37.84,22.45,girls' graphic tees,jcpenney|girls' graphic tees,3.333333,http://www.jcpenney.com/knit-works-34-sleeve-t...,http://s7d2.scene7.com/is/image/JCPenney/DP110...,Knit Works,3,"[{'User': 'aeyn4332', 'Review': 'This top come...","[2cb178d76662bbc5a5a11c4f5a1ce8a1, 9b0da237812..."
5d4c2f94fce965b9b47d4529eea54211,pp5006660117,ZCO Fleur De Lis Turn Up Fray Hem Shorts - Plus,Our ZCO shorts feature a frayed hem and fleur ...,56.76,33.1,shorts,jcpenney|shorts,1.0,http://www.jcpenney.com/zco-fleur-de-lis-turn-...,http://s7d2.scene7.com/is/image/JCPenney/DP021...,ZCO JEANS,1,"[{'User': 'wpwy2334', 'Review': 'I am into the...","[c3963c785d241a2df7c8d45b1d0ea1bf, bb335c0d549..."
d907b06d1eaa998d5f73bda1bbc8c545,pp5006620412,Island Shores™ Short-Sleeve Printed Camp Shirt,Grab our easygoing camp shirt for a casual loo...,53.18,30.2,deals,jcpenney|shops|deals,4.0,http://www.jcpenney.com/island-shores-short-sl...,http://s7d9.scene7.com/is/image/JCPenney/DP021...,Island Shores,4,"[{'User': 'zqyj2343', 'Review': 'great-looking...","[bda6fba5691b4e9ef44cc5150f0c27f0, 52d46a6d98d..."
e085bf92899ae12985b6fcb5dc62755c,pp5004910916,Liz Claiborne® Secretly Slender™ Cropped Leggings,A wide waistband lends pleasing visual proport...,53.46,32.08,leggings,jcpenney|handbags-accessories|leggings,2.75,http://www.jcpenney.com/liz-claiborne-secretly...,http://s7d9.scene7.com/is/image/JCPenney/DP061...,LIZ CLAIBORNE,8,"[{'User': 'cree2134', 'Review': 'These cropped...","[09c14b1be613175515e913b1d914f30c, 3693b845cf3..."
8d217c810b867f368849ba371e99fb13,pp5007100055,Bisou Bisou® Smocked Lace Shorts,Our shorts feature a smocked waist with lace d...,,51.67,bisou bisou,jcpenney|women|bisou bisou,2.0,http://www.jcpenney.com/bisou-bisou-smocked-la...,http://s7d9.scene7.com/is/image/JCPenney/DP040...,Bisou Bisou,1,"[{'User': 'vzph4142', 'Review': 'These are ver...","[b8d314350b207361044ab67fb5559bce, cb6c18fd7db..."
a6fc4982a456e887f2a035ff43fa84b6,pp5005071148,EasyFit Wrap-Around Solid Ruffled Bedskirt,Effortlessly hide box springs and bed frames w...,48.34,24.16,bed skirts,jcpenney|bed skirts,2.625,http://www.jcpenney.com/easyfit-wrap-around-so...,http://s7d2.scene7.com/is/image/JCPenney/DP020...,Asstd National Brand,8,"[{'User': 'somk3223', 'Review': 'So easy to pu...","[e5e77897f032ee1b7653e857f4abd40b, d29d34fef11..."
ba1c0b962a3da07badb77329fa55bcfd,pp5006960892,Hybrid Short-Sleeve Graphic Tee,Look cool and casual in our graphic tee. crewn...,16.44,9.38,graphic t-shirts,jcpenney|juniors-guys|graphic t-shirts,4.5,http://www.jcpenney.com/hybrid-short-sleeve-gr...,http://s7d9.scene7.com/is/image/JCPenney/DP021...,Asstd National Brand,2,"[{'User': 'vopj4124', 'Review': 'We are going ...","[cfb7e403a32233578ae11ea13806936c, 4e2364037d4..."
c64512e018dd323ba29fe1c0e7f22b39,pp5006210802,Levi's® 514™ Straight Stretch Jeans,With a straight-leg fit that's not too loose a...,71.91,48.33,view all jeans,jcpenney|view all jeans,2.875,http://www.jcpenney.com/levis-514-straight-str...,http://s7d2.scene7.com/is/image/JCPenney/DP061...,Levi,8,"[{'User': 'xlah3234', 'Review': 'I consistentl...","[2b085fa05f5ca65179bcfd0c98c6d43e, f6bb8f05406..."
2841e1f47d569c3faf3eefc032c43d0a,ens6003650073,Croscill Classics® Cassandra Bath Collection,,,12.0748,,,3.444444,http://www.jcpenney.com/croscill-classics-cass...,http://s7d9.scene7.com/is/image/JCPenney/DP021...,Croscill Classics,9,"[{'User': 'zrxh2223', 'Review': 'I recently pu...","[0d9cd6442b22b5780fd0e4b5855e757f, 2cbf43cc137..."


- Displayed above is the list of products.
- It is worth noting that each product has a reviews column that consists of its reviews.
- Each review has the keys:
    1. User
    2. Review
    3. Score 

In [3]:
jcpenney_products.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7982 entries, b6c0b6bea69c722939585baeac73c13d to 2cc49292b44cc12fe22206440d3e7472
Data columns (total 14 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   sku                     7982 non-null   object 
 1   name_title              7982 non-null   object 
 2   description             7982 non-null   object 
 3   list_price              7982 non-null   object 
 4   sale_price              7982 non-null   object 
 5   category                7982 non-null   object 
 6   category_tree           7982 non-null   object 
 7   average_product_rating  7982 non-null   float64
 8   product_url             7982 non-null   object 
 9   product_image_urls      7982 non-null   object 
 10  brand                   7982 non-null   object 
 11  total_number_reviews    7982 non-null   int64  
 12  Reviews                 7982 non-null   object 
 13  Bought With             7982 non-null  

Some data types need to be converted into other data types for easier manipulation

In [4]:
# Convert list price to float type
jcpenney_products['list_price'] = [float(price) if price else None for price in jcpenney_products['list_price']];

In [5]:
jcpenney_products.describe()

Unnamed: 0,list_price,average_product_rating,total_number_reviews
count,5816.0,7982.0,7982.0
mean,144.776618,2.988683,4.893886
std,499.223719,0.911673,3.314284
min,-65.27,1.0,1.0
25%,40.7,2.5,2.0
50%,58.01,3.0,4.0
75%,87.02,3.5,8.0
max,17122.17,5.0,23.0


- The lowest `list_price` is a negative value which is impossible
- We remove the negative value and make it positive

In [6]:
# clean list_price by removing the negative prices
jcpenney_products['list_price'] = [abs(price) if price else None for price in jcpenney_products['list_price']]

In [7]:
jcpenney_products.describe()

Unnamed: 0,list_price,average_product_rating,total_number_reviews
count,5816.0,7982.0,7982.0
mean,145.424555,2.988683,4.893886
std,499.035327,0.911673,3.314284
min,8.01,1.0,1.0
25%,41.0425,2.5,2.0
50%,58.01,3.0,4.0
75%,87.02,3.5,8.0
max,17122.17,5.0,23.0


### jcpenney_reviewers.json

In [8]:
reviewers = pd.read_json('jcpenney_reviewers.json', lines=True)
jcpenney_reviewers = reviewers.set_index('Username')
jcpenney_reviewers.sample(10)


Unnamed: 0_level_0,DOB,State,Reviewed
Username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ytct4242,07.08.1953,Colorado,"[ee22001ad4bd39ed5bc4e149508efaf7, 35a3adc8661..."
edse4341,27.07.1999,Delaware,[3290b2e89ee77e3d0b6df70991a83a5f]
sqcq4333,02.08.1974,Michigan,[9a977788e18891eefa98bd6a6f313aa0]
qwia3432,28.07.1993,Massachusetts,"[dc99dc23e9e9caea26ccc3838bbcdc18, 8c3c6898b34..."
nbhr1133,31.07.1980,Nevada,[e781043784f4470aaee0f3d3b1f2b0ce]
fxjs4224,07.08.1953,Montana,[8ca3ee278859b4982f177c558786d8b4]
dpfn4142,03.08.1969,Kentucky,[3e73c327d6ccdfc4f9c56031b8968f5d]
hljb2211,01.08.1978,North Dakota,[]
fqtk4432,05.08.1960,Arkansas,[]
johj2441,29.07.1990,American Samoa,"[b126ec281550a54815a619149b3162ad, 2209b487231..."


In [9]:
jcpenney_reviewers.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5000 entries, bkpn1412 to yuoc2324
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   DOB       5000 non-null   object
 1   State     5000 non-null   object
 2   Reviewed  5000 non-null   object
dtypes: object(3)
memory usage: 156.2+ KB


In [10]:
# Convert the DOB column to datetime
jcpenney_reviewers['DOB'] = pd.to_datetime(jcpenney_reviewers['DOB'], format="%d.%m.%Y")

In [11]:
jcpenney_reviewers.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5000 entries, bkpn1412 to yuoc2324
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   DOB       5000 non-null   datetime64[ns]
 1   State     5000 non-null   object        
 2   Reviewed  5000 non-null   object        
dtypes: datetime64[ns](1), object(2)
memory usage: 156.2+ KB


In [12]:
jcpenney_reviewers.describe()

Unnamed: 0,DOB
count,5000
mean,1975-10-28 03:53:16.800000
min,1950-08-08 00:00:00
25%,1962-08-05 00:00:00
50%,1975-08-02 00:00:00
75%,1988-07-29 00:00:00
max,2001-07-26 00:00:00


### products.csv

In [13]:
products = pd.read_csv('products.csv', index_col='Uniq_id')
products.sample(10)

Unnamed: 0_level_0,SKU,Name,Description,Price,Av_Score
Uniq_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
8e04f398f858b635f09374fc51880664,pp5005071429,Stafford® Travel Medium Blue Suit Separates - ...,,,3.111111
db216a1570f881dff6a2efa3bb8c6218,pp5002461118,Vanity Fair® Perfectly Yours® Seamless Tailore...,Vanity Fair high-waistbriefs are seamless for ...,,3.875
f9a9f0851388dfb725b325b14e33a4a6,1a31944,CLOSEOUT! Seventeen® Crystal Violet Comforter ...,,,2.666667
e6d507d37586b24d789006129520d78e,pp5007150083,Danny & Nicole® Sleeveless Stripe Fit-and-Flar...,Our figure-flattering fit-and-flare dress is d...,103.94,4.0
918e3186ce53a3dc9dc19c44b69a17fb,pp5005730201,Van Heusen® Dress Pants - Boys 8-20,These handsome IZOD pants create the perfect p...,74.3,4.333333
ee18f7afd6d84067d194b6f2ab6aeca9,pp5006081870,Arizona Lace-Trim Swing Tank Top,Keep your look fresh and interesting in our la...,30.6,2.333333
e86b28f8b644d62cdb7e8313259e4cd0,pp5006660007,Stylus™ Linen Crop Pants,"Whether you wear them for work or play, our co...",53.18,1.0
759d0c16d5d0707c6e63ae28eb3bd335,pp5006210555,Clarks® Leisa Grove Leather Sandals - Wide Width,Stay comfortable and looking great all day wit...,96.69,2.714286
a69bf232d2f3001b5815b0f46fc39fd0,pp5004451228,Hamilton Beach® Searing Grill with Removable Lid,Get outdoor flavors indoors and create a beaut...,132.95,2.428571
37d39dadb0d4fc9e8577224d6c869300,pp5006100212,Melissa & Doug® BonBon Bear,Bear hugs are super sweet with this cuddly bea...,23.65,4.0


In [14]:
products.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7982 entries, b6c0b6bea69c722939585baeac73c13d to 2cc49292b44cc12fe22206440d3e7472
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   SKU          7915 non-null   object 
 1   Name         7982 non-null   object 
 2   Description  7439 non-null   object 
 3   Price        5816 non-null   float64
 4   Av_Score     7982 non-null   float64
dtypes: float64(2), object(3)
memory usage: 374.2+ KB


In [15]:
products.describe()

Unnamed: 0,Price,Av_Score
count,5816.0,7982.0
mean,144.776618,2.988683
std,499.223719,0.911673
min,-65.27,1.0
25%,40.7,2.5
50%,58.01,3.0
75%,87.02,3.5
max,17122.17,5.0


In [16]:
# Ensure Price doesn't have a negative price value
products['Price'] = [abs(price) for price in products['Price']]

In [17]:
products.describe()

Unnamed: 0,Price,Av_Score
count,5816.0,7982.0
mean,145.424555,2.988683
std,499.035327,0.911673
min,8.01,1.0
25%,41.0425,2.5
50%,58.01,3.0
75%,87.02,3.5
max,17122.17,5.0


### reviews.csv

In [18]:
reviews = pd.read_csv('reviews.csv', index_col='Uniq_id')
reviews.sample(10)

Unnamed: 0_level_0,Username,Score,Review
Uniq_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2359d2eecb730d412e1012e15cf4e00a,fiha2232,1,"Im 60, 5ft, 125 lbs and on a quest for the per..."
66f9eae18059eb918fe62d5fa5588d92,gndc3142,4,"The carafe keeps coffee hot for a long, long t..."
576ae3b84626e43e79dd4e3f49bd8424,rrvn2421,0,I applied $20 coupon and 15% off two sale item...
f4971ef583a5938125f243ee308fb2e6,dvzv3143,1,I was hesitant to buy eyeliner online but I wa...
bd1d7b3c6983ab19a8ed222785905acb,fwus2213,1,I bought this shoe for work and am loving it! ...
e39f3d63f2290a7f3fe59ecbd1c6f7b3,bpgr4424,1,I ordered two: the Paris red and the Green/nav...
73fdd31c8a9420d000c00d5dbbfdb97d,sqen1321,1,Two weeks ago I purchased the Stila Smudge Sti...
b1f0b874afad64e3cf964477a2703187,ixsa3224,1,"Great fit, so happy to find petite length. I g..."
36aa1f0cbb4100dbb14833b4612f6ba5,pckz3233,0,shoes were poorly made cheap quality returned ...
851e01723d7c47a086beb356c0169589,zgat4123,1,Looks good with my other pillows in my Florida...


In [19]:
reviews.info()

<class 'pandas.core.frame.DataFrame'>
Index: 39063 entries, b6c0b6bea69c722939585baeac73c13d to 2cc49292b44cc12fe22206440d3e7472
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Username  39063 non-null  object
 1   Score     39063 non-null  int64 
 2   Review    39063 non-null  object
dtypes: int64(1), object(2)
memory usage: 1.2+ MB


In [20]:
reviews.describe()

Unnamed: 0,Score
count,39063.0
mean,1.487648
std,1.400332
min,0.0
25%,0.0
50%,1.0
75%,2.0
max,5.0


### users.csv

In [21]:
users = pd.read_csv('users.csv', index_col='Username')
users.sample(10)

Unnamed: 0_level_0,DOB,State
Username,Unnamed: 1_level_1,Unnamed: 2_level_1
xhfq1322,08.08.1951,Maryland
pgdr4442,07.08.1952,Guam
psdx3143,29.07.1990,Alaska
iaob3322,06.08.1956,Alaska
vwzm2242,03.08.1970,California
vafx3132,05.08.1960,Maryland
hxsq3412,05.08.1960,Missouri
ccwf4221,03.08.1971,Alaska
alxz1243,27.07.1998,Idaho
vygj4433,07.08.1955,Delaware


In [22]:
users.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5000 entries, bkpn1412 to yuoc2324
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   DOB     5000 non-null   object
 1   State   5000 non-null   object
dtypes: object(2)
memory usage: 117.2+ KB


In [23]:
# convert DOB to datetime object
users['DOB'] = pd.to_datetime(users['DOB'], format="%d.%m.%Y")

In [24]:
users.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5000 entries, bkpn1412 to yuoc2324
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   DOB     5000 non-null   datetime64[ns]
 1   State   5000 non-null   object        
dtypes: datetime64[ns](1), object(1)
memory usage: 117.2+ KB


In [25]:
users.describe()

Unnamed: 0,DOB
count,5000
mean,1975-10-28 03:53:16.800000
min,1950-08-08 00:00:00
25%,1962-08-05 00:00:00
50%,1975-08-02 00:00:00
75%,1988-07-29 00:00:00
max,2001-07-26 00:00:00


## Data Visualization

- In this section we start looking at relationships between variables
- We do this through charts