Skip to content

ciDSproj/coursework

Repository files navigation

Relevant Coursework

Below are presented several assignments I worked on while studying for obtaining the Certificate in Data Science from UW Professional & Continuing Education, 2021. The coursework covers topics like Advanced Data Analysis, Hypothesis Testing, Linear & Logistic Regression, Time Series Analysis, Advanced Machine Learning, Ensemble Learning, Neural Networks and Deep Learning.


EDA and Hypothesis Simulation for House Prices
This project contains Exploratory Data Analysis of the house dataset which includes data cleaning, visual exploration and investigatating summary statistics. This allowed drawing insights regarding various factors that influence a home’s price. It also covers Hypothesis Simulation that includes computing summary statistics and confidence intervals by classical bootstrap and Bayesian methods, as well as hypothesis tests to determine if the differences in pairs and if multiple populations are significant. For data cleaning are used NumPy and Pandas libraries from Python, and Matplotlib and Seaborn for data visualization.

Linear Regression prediction of House Prices
The assignment includes selecting the features and building the initial model. This model was further improved by applying feature engineering and regularization techniques, like Lasso and Ridge regression. The models were compared using evaluation metrics. For data cleaning and preprocessing are used NumPy and Pandas, and Sklearn and Statsmodels for building the models.

Time Series Analysis
This project covers the analysis of a time series dataset. The analysis includes checking if the time series is stationary and has a seasonal component. It is performed an STL decomposition of the time series into its trend, seasonal and residual components. The analysis includes finding the best parameters for the ARIMA model, computing and evaluating the forecast production for the last 12 months of the data. The project is done in Python and uses NumPy, Pandas and Matplotlib libraries.

Ensemble Models-Random Forest Classifier
This project covers data preprocessing and exploratory analysis for a targeted marketing campaign. For classifying the customers who want to get bank promotion emails and the ones who don't, it was built an ensemble model (random forest) and a decision tree model. The performance of both models was compared by assessing their accuracy score. It was used NumPy and Pandas for data preprocessing, Matplotlib and Seaborn for visualizations and sklearn for building and evaluating the models.

Image Identification with CNN
In this project I built a machine learning model that can identify the objects in images. It is used the CIFAR-10 dataset and the model has the capability to search for 10 objects. To improve the model's accuracy it was adjusted the number of layers and the activation function, then compared the results. NumPy and Pandas libraries were used for preparing and exploring the data, and TensorFlow and Keras for building the neural network model.

Topic Identification with RNN and LTSM
For this assignment it is used the Keras Reuters newswire topics dataset to build a model that classifies the topic of each article or newswire. There were built and compared three different models using RNN (Recurrent Neural Network-LTSM(Long Short-Term Memory). For building the RNN-LTSM models are used TensorFlow and Keras libraries.

Final Capstone Project
The capstone project uses a dataset for semiconductor manufacturing. The dataset is cleaned and prepared, data visually explored and analyzed. It is also applied feature selection to reduce dimensionality of data. Inially were built and evaluated three different classification models: a Decision Tree classifier, Gradient Boosting and Support Vector Classifier. The last part of the capstone project covered building a Multi-Layer Neural Network Perceptron (MLP) model, a Deep Neural Network (DNN) model, and a Recurrent Neural Networks (RNN)- LSTM model. Each model was evaluated by calculating the accuracy rate. The Python libraries used are Pandas, Numpy, Matplotlib, sklearn, TensorFlow and Keras.

About

Relevant Coursework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors