Capstone Project AI Academy - Time Series Flask Application

Check Point 1

Task 1.1 - Data Manipulation using Python

Check for missing values in the dataset.
Find out the outliers in numerical columns.
Product Category with highest and lowest sales.
The category of the product which is being sold with highest discounts
Percentage share of sales at every state and region.

Task 1.2 - SQL & Oracle

Stage 1:

Construct and ER-Diagram for the above-mentioned Requirement
Construct Tables as per the ER-Diagram.
Identify the relationships between tables and use appropriate standards for the same where applicable
Insert the appropriate data into the identified tables from the sample dataset provided. Stage 2:
Display The category name of the products that earns highest profit
Display Most valuable customer name who has frequently ordered products
Display The segment and product name which yields highest profit
Display name of the City with highest sales
Display Percentage share of profit of every region
Display Percentage share of sales at every state
Display the Category of the product with highest demand in each region

Task 1.3 - Statistical Analysis using Python

Descriptive statistics for both numerical and categorical and draw few insights from them.
Perform relevant hypothesis testing (t, chi-Square, Anova tests)

Check Point 2

Task 2.1 - Visualization using Python

Come up with appropriate results and visuals for the following: Product Analysis:

Different types of product categories
Various sub-categories of products
Product with highest demand in each segment
Category of the product with highest demand in each region
The category of the product which is being sold with highest discounts Sales/Profit Analysis:
The category of the product that earns highest profit
Most valuable customer
The segment which yields highest profit
City with highest sales
Percentage share of profit of every region
Percentage share of sales at every state

Task 2.2 - Exploratory Data Analysis

Univariate, Bi- Variate Analysis and Multi- Variate Analysis
Missing values identification and treatment
Outlier analysis and treatment
Data scaling using min-max and/or Z-score normalisation
Data transformation
Feature Engineering

Task 2.3 - Visualization using PowerBI

Connect the data with Power BI desktop and perform Data Manipulation using Power Query Editor. Perform the below tasks in Power BI Desktop.

Determine which State in the Central Region has the highest sales
Identify the City with Highest Sales in California
In which Region do all Product Categories fall beneath the overall average profit?
Find the top 10 Product Names by Sales within each region
Product with highest demand in each segment
Which product is ranked #2 by Sales in the West region?
Trend in profit/sales over region

Task 2.4 - Model Building using ML

You have to build a time series model by aggregating daily data into weekly or monthly data. Apply various time series models like Moving Average, Exponential smoothing.

Compare various Time series models
Evaluate the performance of the model
Identify the right metric to evaluate the performance of the model
Identify issues and concerns on the given data and suggest the best technique/s to overcome the issues Recommendations: As a data analyst, what are the approaches do you suggest the sales team to forecast the sales more accurately? Recommend based on your analysis.

Check Point 3

Task 3.1 - PySpark and Hadoop

Big Data technologies like HDFS, Hive and PySpark need to be used as the historical data increases in size. As part of this task the following activities need to be done.

Develop a PySpark application to load data Spark DataFrames and save it into Hive tables on a Hadoop cluster in Parquet format.
Perform profiling of the data through PySpark and ensure that it is migrated correctly wherever the source is an RDBMS.
Write PySpark routines to cleanse the data, prepare the data to handle missing values, and the data transformations identified in task 1.1 again making sure that the data is written into Hive tables in an efficient format.
If the predictive model identified in task 2.4 available in Spark MLlib then develop a PySpark application to implement and evaluate the ML model with appropriate metrics.
Ensure that the best practices are followed and the design & code use the features of Spark and take advantage thereof.

Task 3.2 - AWS

Move the Datasets to AWS s3
Create Redshift Instance
Ensure you create required tables in Redshift
Create a data pipeline/copy command to move the data from storage to data warehouse(Redshift). You are allowed to use other copy commands as well to move the data from storage to data warehouse.
Connect the Redshift data to PowerBI
Perform the tasks mentioned in Task 2.3(Only 4 core reports)

Task 3.3 - Flask Application

Deploy the Machine Learning Model created Task 2.4 in Flask application

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
CheckPoint-1		CheckPoint-1
CheckPoint-2		CheckPoint-2
CheckPoint-3		CheckPoint-3
Presentation		Presentation
static		static
templates		templates
Capstone Task 2.4 Time Series Analysis - Sai Teja Burla.ipynb		Capstone Task 2.4 Time Series Analysis - Sai Teja Burla.ipynb
Deloitte AI Academy USI - Certificate.pdf		Deloitte AI Academy USI - Certificate.pdf
README.md		README.md
Store.csv		Store.csv
app.py		app.py
mainDF.csv		mainDF.csv
timeSeries.zip		timeSeries.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capstone Project AI Academy - Time Series Flask Application

Check Point 1

Task 1.1 - Data Manipulation using Python

Task 1.2 - SQL & Oracle

Task 1.3 - Statistical Analysis using Python

Check Point 2

Task 2.1 - Visualization using Python

Task 2.2 - Exploratory Data Analysis

Task 2.3 - Visualization using PowerBI

Task 2.4 - Model Building using ML

Check Point 3

Task 3.1 - PySpark and Hadoop

Task 3.2 - AWS

Task 3.3 - Flask Application

About

Releases

Packages

Languages

BurlaSaiTeja/TimeSeriesFlaskApplication

Folders and files

Latest commit

History

Repository files navigation

Capstone Project AI Academy - Time Series Flask Application

Check Point 1

Task 1.1 - Data Manipulation using Python

Task 1.2 - SQL & Oracle

Task 1.3 - Statistical Analysis using Python

Check Point 2

Task 2.1 - Visualization using Python

Task 2.2 - Exploratory Data Analysis

Task 2.3 - Visualization using PowerBI

Task 2.4 - Model Building using ML

Check Point 3

Task 3.1 - PySpark and Hadoop

Task 3.2 - AWS

Task 3.3 - Flask Application

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages