CS675-Introduction-to-Data-Science

Repository for storing code for my MS in Data Science course CS675 Introduction To Data Science at Pace University.

Course description: This course introduces the students to Machine Learning and Deep Learning Technologies, Data Analytics at scale, and Data-driven Science systems in order to extract insights data from in various forms. These scientific processes will include various phases and techniques such as Data Preparation, Model Building, and Prediction, Clustering, Association, Regression (Linear and Logistic), Classification, Decision Trees, Textual Data Analysis and Data Presentation. The basic concepts will be covered with examples which can be tried on R or Python by using RStudio and/or Jupyter Notebooks (aka IPython Notebooks). These miniaturized examples of real-world problems are designed in such way that the student will gain a clear understanding and get firm foundation of the methods covered in the course. In addition, the course gives an introduction to R Statistical Language, Apache (Databricks) Spark, and Anaconda Analytics platforms.

Project 1

In this project, the task was to perform an EDA (Exploratory Data Analysis) on a dataset of customer churn in the telecommunications industry. I inspected the raw dataset, cleaned it, and examined each of the variables and their relationships to each other, in order to predict the variables that affect churning of customers (churn is when customers leave the company).

Class presentation of this project: https://www.youtube.com/watch?v=0U4XsjbPn8U

Project 2

This project was a continuation of Project #1. In this project, I performed various stages of machine learning analysis on the same dataset and used four models to generate predictions - Naive Bayes, Logistic Regression, Random Forest, and XGBoost. I performed SMOTE analysis and hyperparameter tuning for these models, and analyzed the best model for predicting churn.

Class presentation of this project: https://www.youtube.com/watch?v=M1PMJYq2hhI
Full code walkthrough of this project: https://www.youtube.com/watch?v=VVIC3dSqqk8&t=25s

Project 3

In this project, the task was to perform predictive time-series forecasting on a dataset of New York City's electric consumption for its 5 boroughs. I extracted the data from the City of New York website and used the FB Prophet package for Python to perform predictive time-series forecasting on this dataset.

Class presentation of this project: https://youtu.be/58s0qYSVGaQ

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Project1		Project1
Project2		Project2
Project3		Project3
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS675-Introduction-to-Data-Science

Project 1

Project 2

Project 3

About

Releases

Packages

Languages

awesomecosmos/CS675-Intro-To-Data-Science

Folders and files

Latest commit

History

Repository files navigation

CS675-Introduction-to-Data-Science

Project 1

Project 2

Project 3

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages