Skip to content

Final Project for Misk Skills Data Science Immersive Bootcamp (2021)

Notifications You must be signed in to change notification settings

amjaadalqahtani/DS_Capstone_TeaRecommender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DS Capstone: Tea Recommender (2021)

A Tea Recommendation System using scraped information from online tea stores and health benefits from scientific journals.
It suggests tea based on user preference and ingredients of the tea. The algorithm takes tea name as input and returns the top 5 most similar teas as output. This is a project for the capstone requirement part of the course "Data Science Immersive" by Misk Skills and MCIT.

Folders

TeaRecommender: contains the final project in a jupyter notebook and the .py source code.
data-tidying: the R source code for tidying raw scraped data into the final dataset found in the data folder.
data: the final datasets used for the recommender.
web-scraping: contains all web-scraping files and resulting .csv files.

Data Dictionary

Field Name Data Type Description Example
Health Problem Character Themes of similar health problems based on type inflammation
Health Benefit Character Tea health benefits anti-inflammatory
Name Character Name of tea from scraped content Green Jasmine Allure
Category Character Family of tea based on harvesting and processing methods Green Tea
Time Character Time of day best to drink it night, day, anytime
Description Character Product description from scraped content Green tea blend with alluring jasmine. Reduce the risk of developing many forms of cancer. Lower total cholesterol levels. Relaxing, calming effect. Improves Digestion.
Ingredients Character Product ingredients from scraped content Green tea leaves, pure jasmine petals
Flavor Character Tea flavor profile Earthy flavor with aromatic jasmine after taste
Color Character Color when brewed green
Caffeine Character Presence of caffeine in tea no, yes
Price Integer Price of tea 12.00
item_ID Integer identifying number for tea name 990720US01
ID Integer identifying number -as generated by Rstudio- for tea name 14

Contributing

The file "clean_megalist.csv" is compiled from 9 different tea brands, the zip file "scraped_teabrands.zip" includes the scraped raw data. Both can be found in the "data" folder. I made them for the purposes of the class, but they can be used to practice with data science projects. Feel free to expirement with them! Other files are the result of my own research and code for information gathering, data tidying, and modeling a simple recommendation system.

References

All the references used for this project are listed here: https://rpubs.com/aalqahtani/838154