# University of Stirling

# ITNPBD2 Representing and Manipulating Data

# Assignment Spring 2024

# A Consultancy Job for JC Penney

This notebook forms the assignment instructions and submission document of the assignment for ITNPBD2. Read the instructions carefully and enter code into the cells as indicated.

You will need these five files, which were in the Zip file you downloaded from the course webpage:

- jcpenney_reviewers.json
- jcpenney_products.json
- products.csv
- reviews.csv
- users.csv

The data in these files describes products that have been sold by the American retail giant, JC Penney, and reviews by customers who bought them. Note that the product data is real, but the customer data is synthetic.

Your job is to process the data, as requested in the instructions in the markdown cells in this notebook.

# Completing the Assignment

Rename this file to be xxxxxx_BD2 where xxxxxx is your student number, then type your code and narrative description into the boxes provided. Add as many code and markdown cells as you need. The cells should contain:

- **Text narrative describing what you did with the data**
- **The code that performs the task you have described**
- **Comments that explain your code**

# Marking Scheme
The assessment will be marked against the university Common Marking Scheme (CMS)

Here is a summary of what you need to achieve to gain a grade in the major grade bands:

|Grade|Requirement|
|:---|:---|
| Fail | You will fail if your code does not run or does not achieve even the basics of the task. You may also fail if you submit code without either comments or a text explanation of what the code does.|
| Pass | To pass, you must submit sufficient working code to show that you have mastered the basics of the task, even if not everything works completely. You must include some justifications for your choice of methods, but without mentioning alternatives. |
| Merit | For a merit, your code must be mostly correct, with only small problems or parts missing, and your comments must be useful rather than simply re-stating the code in English. Most choices for methods and structures should be explained and alternatives mentioned. |
| Distinction | For a distinction, your code must be working, correct, and well commented and shows an appreciation of style, efficiency and reliability. All choices for methods and structures are concisely justified and alternatives are given well thought considerations. For a distinction, your work should be good enough to present to executives at the company.|

The full details of the CMS can be found here

https://www.stir.ac.uk/about/professional-services/student-academic-and-corporate-services/academic-registry/academic-policy-and-practice/quality-handbook/assessment-policy-and-procedure/appendix-2-postgraduate-common-marking-scheme/

Note that this means there are not certain numbers of marks allocated to each stage of the assignment. Your grade will reflect how well your solutions and comments demonstrate that you have achieved the learning outcomes of the task. 

## Submission
When you are ready to submit, **print** your notebook as PDF (go to File -> Print Preview) in the Jupyter menu. Make sure you have run all the cells and that their output is displayed. Any lines of code or comments that are not visible in the pdf should be broken across several lines. You can then submit the file online.

Late penalties will apply at a rate of three marks per day, up to a maximum of 7 days. After 7 days you will be given a mark of 0. Extensions will be considered under acceptable circumstances outside your control.

## Academic Integrity

This is an individual assignment, and so all submitted work must be fully your own work.

The University of Stirling is committed to protecting the quality and standards of its awards. Consequently, the University seeks to promote and nurture academic integrity, support staff academic integrity, and support students to understand and develop good academic skills that facilitate academic integrity.

In addition, the University deals decisively with all forms of Academic Misconduct.

Where a student does not act with academic integrity, their work or behaviour may demonstrate Poor Academic Practice or it may represent Academic Misconduct.

### Poor Academic Practice

Poor Academic Practice is defined as: "The submission of any type of assessment with a lack of referencing or inadequate referencing which does not effectively acknowledge the origin of words, ideas, images, tables, diagrams, maps, code, sound and any other sources used in the assessment."

### Academic Misconduct

Academic Misconduct is defined as: "any act or attempted act that does not demonstrate academic integrity and that may result in creating an unfair academic advantage for you or another person, or an academic disadvantage for any other member or member of the academic community."

Plagiarism is presenting somebody else’s work as your own **and includes the use of artificial intelligence tools such as GPT or CoPilot**. Plagiarism is a form of academic misconduct and is taken very seriously by the University. Students found to have plagiarised work can have marks deducted and, in serious cases, even be expelled from the University. Do not submit any work that is not entirely your own. Do not collaborate with or get help from anybody else with this assignment.

The University of Stirling's full policy on Academic Integrity can be found at:

https://www.stir.ac.uk/about/professional-services/student-academic-and-corporate-services/academic-registry/academic-policy-and-practice/quality-handbook/academic-integrity-policy-and-academic-misconduct-procedure/

## The Assignment
Your task with this assignment is to use the data provided to demonstrate your Python data manipulation skills.

There are three `.csv` files and two `.json` files so you can process different types of data. The files also contain unstructured data in the form of natural language in English and links to images that you can access from the JC Penney website (use the field called `product_image_urls`).

Start with easy tasks to show you can read in a file, create some variables and data structures, and manipulate their contents. Then move onto something more interesting.

Look at the data that we provided with this assessment and think of something interesting to do with it using whatever libraries you like. Describe what you decide to do with the data and why it might be interesting or useful to the company to do it.

You can add additional data if you need to - either download it or access it using `requests`. Produce working code to implement your ideas in as many cells as you need below. There is no single right answer, the aim is to simply show you are competent in using python for data analysis. Exactly how you do that is up to you.

For a distinction class grade, this must show originality, creative thinking, and insights beyond what you've been taught directly on the module.

## Structure
You may structure the project how you wish, but here is a suggested guideline to help you organise your work:

 1. Data Exploration - Explore the data and show you understand its structure and relations
 2. Data Validation - Check the quality of the data. Is it complete? Are there obvious errors?
 3. Data Visualisation - Gain an overall understanding of the data with visualisations
 4. Data Analysis = Set some questions and use the data to answer them
 5. Data Augmentation - Add new data from another source to bring new insights to the data you already have

# Remember to make sure you are working completely on your own.
# Don't work in a group or with a friend
You may NOT use any automated code generation or analytics tools for this assignment, so do not use tools like GPT. You can look up the syntax for the functions you use, but you must write the code yourself and the comments must provide an insightful analysis of the results.



# Assignment Start

## Data Exploration

- We begin by loading the data to see how it is organized. 
- We will do this by creating a dataframe for each dataset using pandas

In [1]:
import pandas as pd

### jcpenny_products.json

In [13]:
jcpenney_products = pd.read_json('jcpenney_products.json', lines=True)
jcpenney_products = jcpenney_products.set_index('uniq_id')
jcpenney_products

Unnamed: 0_level_0,sku,name_title,description,list_price,sale_price,category,category_tree,average_product_rating,product_url,product_image_urls,brand,total_number_reviews,Reviews,Bought With
uniq_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
b6c0b6bea69c722939585baeac73c13d,pp5006380337,Alfred Dunner® Essential Pull On Capri Pant,You'll return to our Alfred Dunner pull-on cap...,41.09,24.16,alfred dunner,jcpenney|women|alfred dunner,2.625,http://www.jcpenney.com/alfred-dunner-essentia...,http://s7d9.scene7.com/is/image/JCPenney/DP122...,Alfred Dunner,8,"[{'User': 'fsdv4141', 'Review': 'You never hav...","[898e42fe937a33e8ce5e900ca7a4d924, 8c02c262567..."
93e5272c51d8cce02597e3ce67b7ad0a,pp5006380337,Alfred Dunner® Essential Pull On Capri Pant,You'll return to our Alfred Dunner pull-on cap...,41.09,24.16,alfred dunner,jcpenney|women|alfred dunner,3.000,http://www.jcpenney.com/alfred-dunner-essentia...,http://s7d9.scene7.com/is/image/JCPenney/DP122...,Alfred Dunner,8,"[{'User': 'tpcu2211', 'Review': 'You never hav...","[bc9ab3406dcaa84a123b9da862e6367d, 18eb69e8fc2..."
013e320f2f2ec0cf5b3ff5418d688528,pp5006380337,Alfred Dunner® Essential Pull On Capri Pant,You'll return to our Alfred Dunner pull-on cap...,41.09,24.16,view all,jcpenney|women|view all,2.625,http://www.jcpenney.com/alfred-dunner-essentia...,http://s7d9.scene7.com/is/image/JCPenney/DP122...,Alfred Dunner,8,"[{'User': 'pcfg3234', 'Review': 'You never hav...","[3ce70f519a9cfdd85cdbdecd358e5347, b0295c96d2b..."
505e6633d81f2cb7400c0cfa0394c427,pp5006380337,Alfred Dunner® Essential Pull On Capri Pant,You'll return to our Alfred Dunner pull-on cap...,41.09,24.16,view all,jcpenney|women|view all,3.500,http://www.jcpenney.com/alfred-dunner-essentia...,http://s7d9.scene7.com/is/image/JCPenney/DP122...,Alfred Dunner,8,"[{'User': 'ngrq4411', 'Review': 'You never hav...","[efcd811edccbeb5e67eaa8ef0d991f7c, 7b2cc00171e..."
d969a8542122e1331e304b09f81a83f6,pp5006380337,Alfred Dunner® Essential Pull On Capri Pant,You'll return to our Alfred Dunner pull-on cap...,41.09,24.16,view all,jcpenney|women|view all,3.125,http://www.jcpenney.com/alfred-dunner-essentia...,http://s7d9.scene7.com/is/image/JCPenney/DP122...,Alfred Dunner,8,"[{'User': 'nbmi2334', 'Review': 'You never hav...","[0ca5ad2a218f59eb83eec1e248a0782d, 9869fc8da14..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16e3ca6b6a2d3f5da9a81df6cfe7e27a,pp5002691549,Hoover® WindTunnel® Upright Vacuum Cleaner,This Hoover® vacuum features dual-stage cyclon...,201.97,130.67,view all brands,jcpenney|for-the-home|view all brands,3.875,http://www.jcpenney.com/hoover-windtunnel-upri...,http://s7d9.scene7.com/is/image/JCPenney/DP032...,Hoover,8,"[{'User': 'cced4433', 'Review': '1) the hose i...","[30611c465e893326b8b676ccc5e9e42b, 799d6290601..."
209bec04d9f194616358a77d7b41b314,pp5002691549,Hoover® WindTunnel® Upright Vacuum Cleaner,This Hoover® vacuum features dual-stage cyclon...,201.97,130.67,sale,jcpenney|for-the-home|sale,3.250,http://www.jcpenney.com/hoover-windtunnel-upri...,http://s7d9.scene7.com/is/image/JCPenney/DP032...,Hoover,8,"[{'User': 'rtbj4421', 'Review': '1) the hose i...","[ac4dcfeca034e705ac2be1a79bbe3acf, c2300c3e551..."
7a8a7cba7b69b4c46ceac1d9666d84f0,pp5002691549,Hoover® WindTunnel® Upright Vacuum Cleaner,This Hoover® vacuum features dual-stage cyclon...,201.97,130.67,vacuums & floorcare,jcpenney|for-the-home|vacuums & floorcare,2.500,http://www.jcpenney.com/hoover-windtunnel-upri...,http://s7d9.scene7.com/is/image/JCPenney/DP032...,Hoover,8,"[{'User': 'qven4314', 'Review': '1) the hose i...","[fc831da1b1d09b0bd7d6893e9615232f, 9f5f2c625d4..."
9887bb20c12be3c09d094e97ac8d22be,pp5005921226,Hope Chest Embroidered Quilt & Accessories,,,35.63142,comforters & bedding sets,jcpenney|bed-bath|comforters & bedding sets,3.500,http://www.jcpenney.com/hope-chest-embroidered...,http://s7d9.scene7.com/is/image/JCPenney/DP072...,Asstd National Brand,8,"[{'User': 'sguz3424', 'Review': 'This product ...","[9968d41d84274b22610278d352f0753a, 7ff057a74f7..."


- Displayed above is the list of products.
- It is worth noting that each product has a reviews column that consists of its reviews.

### jcpenney_reviewers.json

In [12]:
jcpenney_reviewers = pd.read_json('jcpenney_reviewers.json', lines=True)
jcpenney_reviewers = jcpenny_reviewers.set_index('Username')
jcpenney_reviewers

Unnamed: 0_level_0,DOB,State,Reviewed
Username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
bkpn1412,31.07.1983,Oregon,[cea76118f6a9110a893de2b7654319c0]
gqjs4414,27.07.1998,Massachusetts,[fa04fe6c0dd5189f54fe600838da43d3]
eehe1434,08.08.1950,Idaho,[]
hkxj1334,03.08.1969,Florida,"[f129b1803f447c2b1ce43508fb822810, 3b0c9bc0be6..."
jjbd1412,26.07.2001,Georgia,[]
...,...,...,...
mfnn1212,27.07.1997,Delaware,[d6cd506246bd17afa611b6a06236713c]
ejnb3414,01.08.1976,Minnesota,[97de1506cd0bcbe50f2797cd0588eb81]
pdzw1433,28.07.1994,Ohio,"[799d62906019d910fa744987da184ae7, b8f5deb7b02..."
npha1342,07.08.1953,Montana,[6250b1d691cd3842f05b87736f2fadbf]


- `jcpenney_reviewers.json` file doesn't contain as many keys

### products.csv

In [6]:
products = pd.read_csv('products.csv', index_col='Uniq_id')
products

Unnamed: 0_level_0,SKU,Name,Description,Price,Av_Score
Uniq_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
b6c0b6bea69c722939585baeac73c13d,pp5006380337,Alfred Dunner® Essential Pull On Capri Pant,Youll return to our Alfred Dunner pull-on capr...,41.09,2.625
93e5272c51d8cce02597e3ce67b7ad0a,pp5006380337,Alfred Dunner® Essential Pull On Capri Pant,Youll return to our Alfred Dunner pull-on capr...,41.09,3.000
013e320f2f2ec0cf5b3ff5418d688528,pp5006380337,Alfred Dunner® Essential Pull On Capri Pant,Youll return to our Alfred Dunner pull-on capr...,41.09,2.625
505e6633d81f2cb7400c0cfa0394c427,pp5006380337,Alfred Dunner® Essential Pull On Capri Pant,Youll return to our Alfred Dunner pull-on capr...,41.09,3.500
d969a8542122e1331e304b09f81a83f6,pp5006380337,Alfred Dunner® Essential Pull On Capri Pant,Youll return to our Alfred Dunner pull-on capr...,41.09,3.125
...,...,...,...,...,...
16e3ca6b6a2d3f5da9a81df6cfe7e27a,pp5002691549,Hoover® WindTunnel® Upright Vacuum Cleaner,This Hoover® vacuum features dual-stage cyclon...,201.97,3.875
209bec04d9f194616358a77d7b41b314,pp5002691549,Hoover® WindTunnel® Upright Vacuum Cleaner,This Hoover® vacuum features dual-stage cyclon...,201.97,3.250
7a8a7cba7b69b4c46ceac1d9666d84f0,pp5002691549,Hoover® WindTunnel® Upright Vacuum Cleaner,This Hoover® vacuum features dual-stage cyclon...,201.97,2.500
9887bb20c12be3c09d094e97ac8d22be,pp5005921226,Hope Chest Embroidered Quilt & Accessories,,,3.500


### reviews.csv

In [7]:
reviews = pd.read_csv('reviews.csv', index_col='Uniq_id')
reviews

Unnamed: 0_level_0,Username,Score,Review
Uniq_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
b6c0b6bea69c722939585baeac73c13d,fsdv4141,2,You never have to worry about the fit...Alfred...
b6c0b6bea69c722939585baeac73c13d,krpz1113,1,Good quality fabric. Perfect fit. Washed very ...
b6c0b6bea69c722939585baeac73c13d,mbmg3241,2,I do not normally wear pants or capris that ha...
b6c0b6bea69c722939585baeac73c13d,zeqg1222,0,I love these capris! They fit true to size and...
b6c0b6bea69c722939585baeac73c13d,nvfn3212,3,This product is very comfortable and the fabri...
...,...,...,...
2cc49292b44cc12fe22206440d3e7472,whxp1433,2,This bedspread is as practical as it is beauti...
2cc49292b44cc12fe22206440d3e7472,kfqi4333,1,I purchased this same wedding ring quilt 16 ye...
2cc49292b44cc12fe22206440d3e7472,bynj4221,0,This was a gift for my daughter and she loves ...
2cc49292b44cc12fe22206440d3e7472,eawq2222,1,I purchased these pillow shams as well as the ...


### users.csv

In [8]:
users = pd.read_csv('users.csv', index_col='Username')
users

Unnamed: 0_level_0,DOB,State
Username,Unnamed: 1_level_1,Unnamed: 2_level_1
bkpn1412,31.07.1983,Oregon
gqjs4414,27.07.1998,Massachusetts
eehe1434,08.08.1950,Idaho
hkxj1334,03.08.1969,Florida
jjbd1412,26.07.2001,Georgia
...,...,...
mfnn1212,27.07.1997,Delaware
ejnb3414,01.08.1976,Minnesota
pdzw1433,28.07.1994,Ohio
npha1342,07.08.1953,Montana
