This repository talks about my learnings through coursework and projects that I completed in JHU data science with R specialization.
- About The Specialization
- List of Courses and Things Learnt
- Capstone Project
- Further Recommended Readings
This Data Science with R specialization is an introductory level 10-course specialization offered by renowned professors from the Johns Hopkins University. This specialization teaches one the fundamental concepts and techniques to provide the perfect platform for getting started with Data Science.
I learned and made use of the following R packages in this specialization:
Course 1. The Data Scientist's Toolbox
- Overview of data, questions, and tools that data scientists work with.
- Introduction to version control, markdown, git, GitHub, R, and RStudio.
Course 2. R Programming
- Installation and configuration of R
- Programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code
Course 3. Getting and Cleaning Data
- Obtaining data from the web, from APIs, from databases in various formats
- Basics of data cleaning and data sharing
- Data components such as raw data, processing instructions, codebooks, and processed data.
Course 4. Exploratory Data Analysis
- Essential exploratory techniques for summarizing data
- Plotting systems in R, basic principles of constructing data graphics
- Multivariate statistical techniques used to visualize high-dimensional data
Course 5. Reproducible Research
- Reporting data analyses in a reproducible manner
- knitr R library
Course 6. Statistical Interference
- Drawing conclusions about scientific truths from data.
- Bayesian and likelihood theories
- Resolving missing data, observed and unobserved confounding, biases
Course 7. Regression Models
- Regression analysis, least squares and inference using regression models
- ANOVA and ANCOVA, analysis of residuals and variability
- Model selection methods
- Scatterplot Smoothing
Course 8. Practical Machine Learning
- Basic components of building and applying prediction functions for real world applications.
- Training and tests sets, overfitting, error rates, feature creation and evaluation.
- Regression, classification trees, Naive Bayes, and random forests.
Course 9. Developing Data Products
- Drawing conclusions about scientific truths from data.
- Bayesian and likelihood theories
- Resolving missing data, observed and unobserved confounding, biases
The Capstone Project was building a next word prediction app using a commerical dataset provided by SwiftKey.
The app is available with live demo at the above URL. This is a R Shiny app.
The Capstone Presentation contains more details about the how the model works. This presentation was made using R Markdown.