GitHub - AVC-prog/Data_Science_and_Analysis_with_Python_and_SQL: In this repository, I present a collection of projects focused on data analysis and science, featuring real-world datasets and one fictitious dataset for the sake of practice. The projects showcase various data analysis and data science techniques and serve as practical examples, using Excel, Tableau, Power BI, Python, and SQL.

This repository showcases my work in data analysis and data science. Each project involves working with messy datasets, applying SQL and Python for data cleaning, and building predictive models to generate insights and solve problems.

It's important to mention that this is an ever-evolving repository, where the tasks presented may not be fully completed yet. However, work in progress will continue to be added over time.

I believe that the best way to improve is through trial and error, and as such, you may encounter mistakes or less-than-perfect solutions within the code. Rather than hiding them, I’ve intentionally left them in place to hold myself accountable.

Also, there are various approaches and comments highlighting what was done and assumed.

Skills & Techniques Used:

Data Preprocessing & Cleaning

Handling missing values (imputation, removal, interpolation)
Correcting inconsistent string values & data types
Feature engineering (creating new columns, transforming variables)
Handling outliers & scaling numerical features
Working (and creating) with datetime features for time series analysis

SQL & Database Management

Writing queries (joins, window functions, aggregations, common table expressions, stored procedures, transactions, and string manipulations)

Exploratory Data Analysis (EDA)

Visualizing distributions, correlations, and trends
Generating insights through graphs and statistical summaries
Detecting patterns & anomalies in data

Machine Learning Models

Supervised Learning: Linear Regression, Logistic Regression, Random Forest, Decision Tree, XGBoost, Gradient Boosting, KNN, and Neural Networks
Unsupervised Learning: K-Means Clustering and PCA for dimensionality reduction
Time Series Forecasting: ARIMA, SARIMA, and Exponential Smoothing

Project Overviews

Project 1: Pokémon

Objective: Cleaning and creating visualizations for the data and using statistics and various data science models to extrapolate results
Dataset: (https://github.com/KeithGalli/pandas) (csv file is also available in the Project 1 folder)
Key Tasks:
- Performed data cleaning & feature engineering
- Conducted exploratory data analysis (EDA)
- Used machine learning models (I used all the models mentioned above and extrapolated which ones work and which don't for acquiring useful information)
- Applied SQL for data processing and analysis as an alternative to Python Pandas and Pyspark
Structure: It has 3 main headers: Data Analysis (Using Pandas), Data Science (Using Pandas), Transfering the data to MySQL

Project 2: Finance

Objective: Cleaning and creating visualizations for the data and using statistics and various data science models to extrapolate results. It was purposefully made with a small number of rows to emphasize that a large sample of data is necessary to make the machine learning models work properly, as those can be decieving in certain instances.
Dataset: ChatGPT generated data (available on the project 2 folder as a csv file)
Key Tasks:
- Performed data cleaning & feature engineering
- Conducted exploratory data analysis (EDA)
- Used machine learning models (I used all the models mentioned above, except the time series ones, and extrapolated which ones work and which don't for acquiring useful information)
- Applied SQL for data processing and analysis as an alternative to Python Pandas and Pyspark
Structure: It has 3 main headers: Data Analysis (Using Pandas), Data Science (Using Pandas), Transfering the data to MySQL

Project 3: Soccer Analysis

Objective: Cleaning and creating visualizations for the data and using statistics and various data science models to extrapolate results
Dataset: (csv files available in the Project 3 folder)
Key Tasks:
- Performed data cleaning & feature engineering
- Conducted exploratory data analysis (EDA)
- Built and fine-tuned machine learning models (I used all the models mentioned above, except the time series ones, and extrapolated which ones work and which don't for acquiring useful information)
- Applied SQL for data processing and analysis as an alternative to Python Pandas and Pyspark
Structure: It has 3 main headers: Data Analysis (Using Pandas), Data Science (Using Pandas), Transfering the data to MySQL

Project 4: Car Sales Analysis

Objective: Cleaning and creating visualizations for the data and using statistics and various data science models to extrapolate results
Dataset: (https://www.kaggle.com/datasets/safaeahb/car-sales-analysis-dashboard/data?select=car+sales.csv) (csv file is also available in the Project 4 folder)
Key Tasks:
- Performed data cleaning & feature engineering
- Conducted exploratory data analysis (EDA)
- Built and fine-tuned machine learning models (I used all the models mentioned above, except the time series ones, and extrapolated which ones work and which don't for acquiring useful information)
- Applied SQL for data processing and analysis as an alternative to Python Pandas and Pyspark
Structure: It has 3 main headers: Data Analysis (Using Pandas), Data Science (Using Pandas), Transfering the data to MySQL

Project 5: Healthcare Insurance Analysis

Objective: Cleaning and creating visualizations for the data and using statistics and various data science models to extrapolate results
Dataset: (https://github.com/KeithGalli/Regression-Example) (csv file is also available in the Project 5 folder)
Key Tasks:
- Performed data cleaning & feature engineering
- Conducted exploratory data analysis (EDA)
- Built and fine-tuned machine learning models (I used all the models mentioned above, except the time series ones, and extrapolated which ones work and which don't for acquiring useful information)
- Applied SQL for data processing and analysis as an alternative to Python Pandas and Pyspark
Structure: It has 3 main headers: Data Analysis (Using Pandas), Data Science (Using Pandas), Transfering the data to MySQL

Project 6: Online Retail Analysis
Objective: Cleaning and creating visualizations for the data and using statistics and various data science models to extrapolate results
Dataset: (https://archive.ics.uci.edu/dataset/502/online+retail+ii))
Key Tasks:
- Performed data cleaning & feature engineering
- Conducted exploratory data analysis (EDA)
- Built and fine-tuned machine learning models
- Applied SQL for data processing and analysis as an alternative to Python Pandas and Pyspark
Structure: It has 3 main headers: Data Analysis (Using Pandas), Data Science (Using Pandas), Transfering the data to MySQL

Project 7: Telecommunications Analysis

Objective: Cleaning and creating visualizations for the data and using statistics and various data science models to extrapolate results
Dataset: (https://github.com/harshbg/Telecom-Churn-Data-Analysis/blob/master/Telecom%20Churn.csv) (csv file is also available in the Project 7 folder)
Key Tasks:
- Performed data cleaning & feature engineering
- Conducted exploratory data analysis (EDA)
- Used machine learning models (I used all the models mentioned above and extrapolated which ones work and which don't for acquiring useful information)
- Applied SQL for data processing and analysis as an alternative to Python Pandas and Pyspark
Structure: It has 3 main headers: Data Analysis (Using Pandas), Data Science (Using Pandas), Transfering the data to MySQL

Simple Models Folder

This folder contains implementations of commonly used machine learning models (which conrrespond to the headers), including:

Linear Regression
Logistic Regression
Random Forest, Decision Tree
Gradient Boosting
Neural Networks (Basic MLP)
K-Means
Principal Component Analysis (PCA)
Time Series Forecasting (ARIMA, SARIMA, Exponential Smoothing)

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
Project 1: Pokémon Analysis		Project 1: Pokémon Analysis
Project 2: Finance Analysis		Project 2: Finance Analysis
Project 3: Soccer Analysis		Project 3: Soccer Analysis
Project 4: Car Sales Analysis		Project 4: Car Sales Analysis
Project 5: Healthcare Insurance Analysis		Project 5: Healthcare Insurance Analysis
Project 6: Online Retail Analysis		Project 6: Online Retail Analysis
Project 7: Telecommunication Analysis		Project 7: Telecommunication Analysis
Simple_Models		Simple_Models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

This repository showcases my work in data analysis and data science. Each project involves working with messy datasets, applying SQL and Python for data cleaning, and building predictive models to generate insights and solve problems.

Skills & Techniques Used:

Data Preprocessing & Cleaning

SQL & Database Management

Exploratory Data Analysis (EDA)

Machine Learning Models

Project Overviews

Project 1: Pokémon

Project 2: Finance

Project 3: Soccer Analysis

Project 4: Car Sales Analysis

Project 5: Healthcare Insurance Analysis

Project 6: Online Retail Analysis

Project 7: Telecommunications Analysis

Simple Models Folder

About

Uh oh!

Releases

Packages

Languages

AVC-prog/Data_Science_and_Analysis_with_Python_and_SQL

Folders and files

Latest commit

History

Repository files navigation

This repository showcases my work in data analysis and data science. Each project involves working with messy datasets, applying SQL and Python for data cleaning, and building predictive models to generate insights and solve problems.

Skills & Techniques Used:

Data Preprocessing & Cleaning

SQL & Database Management

Exploratory Data Analysis (EDA)

Machine Learning Models

Project Overviews

Project 1: Pokémon

Project 2: Finance

Project 3: Soccer Analysis

Project 4: Car Sales Analysis

Project 5: Healthcare Insurance Analysis

Project 6: Online Retail Analysis

Project 7: Telecommunications Analysis

Simple Models Folder

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages