This is our data engineering project in HCMUS, Ho Chi Minh, Vietnam.
We used scrapy-framework to crawl Amazon website to scrape books' information and then store the data in local database. After preprocessing our database, we used tableau-public flatform for data visualization.
1. Scrapy Framework is used to crawl websites and extract structured data from their pages.
2. Docker is used to manage the postgres database
3. PostgreSQL is a powerful, open source object-relational database system
4. Tableau Public is a free platform to explore, create and publicly share data visualizations online. Here is the link to our dashboards: https://public.tableau.com/app/profile/uyen5114/viz/AmazonBookDetail/BOOK_DETAILS