Skip to content

Repository for storing code for my MS in Data Science course CS675 Introduction to Data Science at Pace University.

Notifications You must be signed in to change notification settings

awesomecosmos/CS675-Intro-To-Data-Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS675-Introduction-to-Data-Science

Repository for storing code for my MS in Data Science course CS675 Introduction To Data Science at Pace University.

Course description: This course introduces the students to Machine Learning and Deep Learning Technologies, Data Analytics at scale, and Data-driven Science systems in order to extract insights data from in various forms. These scientific processes will include various phases and techniques such as Data Preparation, Model Building, and Prediction, Clustering, Association, Regression (Linear and Logistic), Classification, Decision Trees, Textual Data Analysis and Data Presentation. The basic concepts will be covered with examples which can be tried on R or Python by using RStudio and/or Jupyter Notebooks (aka IPython Notebooks). These miniaturized examples of real-world problems are designed in such way that the student will gain a clear understanding and get firm foundation of the methods covered in the course. In addition, the course gives an introduction to R Statistical Language, Apache (Databricks) Spark, and Anaconda Analytics platforms.

Project 1

In this project, the task was to perform an EDA (Exploratory Data Analysis) on a dataset of customer churn in the telecommunications industry. I inspected the raw dataset, cleaned it, and examined each of the variables and their relationships to each other, in order to predict the variables that affect churning of customers (churn is when customers leave the company).

Project 2

This project was a continuation of Project #1. In this project, I performed various stages of machine learning analysis on the same dataset and used four models to generate predictions - Naive Bayes, Logistic Regression, Random Forest, and XGBoost. I performed SMOTE analysis and hyperparameter tuning for these models, and analyzed the best model for predicting churn.

Project 3

In this project, the task was to perform predictive time-series forecasting on a dataset of New York City's electric consumption for its 5 boroughs. I extracted the data from the City of New York website and used the FB Prophet package for Python to perform predictive time-series forecasting on this dataset.

About

Repository for storing code for my MS in Data Science course CS675 Introduction to Data Science at Pace University.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published