Skip to content

MaheshYoganandan/Python_Projects

Repository files navigation

Python Projects

Ecommerce Dataset Exploratory Data Analysis (EDA) Link

Project Overview

Exploratory data analysis on an ecommerce dataset to gain insights, identify patterns, and visualize findings using various visualization libraries.

Technologies Used

  • Python
  • Pandas
  • NumPy
  • Seaborn
  • Matplotlib
  • Plotly

Key Findings

  • Customer demographics, purchase history, and product details analysis
  • Identified correlations and relationships between variables
  • Conducted hypothesis testing and confidence intervals for significant findings

Insights

  • Consumer age group analysis
  • Country-wise analysis
  • Gender classification
  • Income distribution analysis
  • Customer segmentation

Snaps

retail_data_analysis_snap retail_data_analysis_snap retail_data_analysis_snap retail_data_analysis_snap

IMDb Movie Data Scraper Link

Project Description

Web scraping of movie data from IMDb using Selenium and Beautiful Soup, followed by data cleaning and storage in a CSV file.

Tools Used

  • Selenium
  • Beautiful Soup
  • Requests
  • Python
  • Jupyter Notebook
  • NumPy and Pandas

Process

  • Data collection using Selenium
  • Data extraction using Beautiful Soup
  • Error handling and data cleaning using Jupyter Notebook and NumPy and Pandas

Output

A CSV file containing the cleaned and processed movie data.

Project Stats

  • Extracted data from 1900+ movies
  • 1300+ data points obtained after cleaning and preprocessing

Scalability

Can be used to extract more than 100000+ movies data by adjusting parameters and running the script for an extended period.

Snaps

Raw Data raw_data

Cleaned Data cleaned_data

Magic Bricks Data Scraper Link

Project Overview

This project involves web scraping real estate data from Magic Bricks, a popular Indian real estate portal. The scraper extracts valuable information such as property details, prices, locations, and more. I undertook this project to demonstrate my web scraping and data cleaning skills.

Technologies Used

Web Scraping

  • Python
  • Selenium
  • Requests
  • Beautiful Soup (initially used, but replaced by Selenium due to infinite scroll functionality)

Data Cleaning

  • Jupyter Notebook
  • Pandas
  • NumPy
  • re (regular expressions)

Features

  • Extracts property details from Magic Bricks
  • Handles pagination and scraping multiple pages
  • Saves data to a CSV file
  • Used error handling to make the script robust

Reason for choosing Selenium over Beautiful Soup

The website has an infinite scroll function, making it impossible to scrape all details using Beautiful Soup. Therefore, I used Selenium WebDriver to scroll and extract all the details.

Usage

You can use this script to scrape Magic Bricks listing details for any city!

Challenges Faced

  • Extracting the property ID to get each listing's summary
  • Error handling
  • Data cleaning took significant time due to extracting insights from summary, description, and title

Snaps

Raw Data raw_data

Cleaned Data cleaned_data

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors