Datasets Analysis in R

Description:

This project aims to analyze various datasets from Kaggle or other places for the purpose of gaining insights into different aspects of the data, as well as honing my R coding skills through the process of data analysis. The project will continuously be updated by adding new and interesting datasets from Kaggle or other sources, and analyzing them using R.

Zhiwei's Portfolio Projects:

Netflix movies and TV shows exploratory data analysis:

Dataset:Netflix movies and TV shows dataset This dataset comprises a comprehensive list of movies and TV shows that are currently or have been previously available on Netflix, including details such as cast, directors, ratings, release year, and duration, etc.
Code:Netflix Content.rmd.
Goal:

Visualize the number of movies and TV shows available on Netflix.
Visualize the number of movies and TV shows for each rating.
Visualize the number of movies produced by each country using a world map plot in ggplot2.

Skill: Data Cleaning, Univariate Analysis, Bivariate Analysis, Descriptive Statistics.
Requirements: The following R packages and versions:

tidyverse version: 1.3.2.
skimr version: 2.1.5.
ggplot2 version: 3.4.0.

Code Output:Netflix Content output.pdf.
Result: Netflix has twice as many movies as TV shows, and most are produced in the United States, followed by India.

Insurance Cost Regression Analysis:

Dataset: Insurance Cost Datasets this dataset comprises 1,338 rows and 7. variables such as age, sex, bmi, number of children, smoker status, region and charges for insurance. The target variable is the cost of insurance claims.
Code: insurance-cost-regression.rmd.
Goal:

Conduct Exploratory Data Analysis on Insurance Dataset.
Forcast insurance costs

Skill: Data Cleaning, Univariate Analysis, Bivariate Analysis, Descriptive Statistics, Multiple Linear regression, Box-Cox Transformation, RandomForest
Requirement: The following R packages and versions:

tidyverse version: 1.3.2
skimr version: 2.1.5
ggplot2 version: 3.4.0
car version: 3.1-1
MASS version: 7.3-58.2
GGally version: 2.1.2
RandomForest version: 4.7-1.1
Caret version: 6.0-93

Code Output: Insurance cost regression output.pdf
Result: The variable smoker is the most significant variable in determining insurance charges, followed by Body Mass Index (BMI) and age. Other factors such as the number of children, region, and sex have minor or no impact on the charges.

Housing Price Prediction:

Dataset: House Prices
Code: housing-prediction.rmd.
Goal:

Predict housing price

Skill: Linear regression(Ridge Regression, Lasso Regression, and elastic net regression).

Requirement: The following R packages and versions:

tidyverse version: 1.3.2
glmnet

Code Output: Housing Price output.pdf

Acknowledgements:

Netflix movies and TV shows dataset

Medical Cost Personal Datasets

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.gitattributes		.gitattributes
Housing-Price.pdf		Housing-Price.pdf
Insurance Cost.Rmd		Insurance Cost.Rmd
Insurance-Cost.html		Insurance-Cost.html
Netflix Content output.pdf		Netflix Content output.pdf
README.md		README.md
housing-prediction.rmd		housing-prediction.rmd
insurance.csv		insurance.csv
netflix_titles.csv		netflix_titles.csv
zhiwei-netflix-content-data-analysis.rmd		zhiwei-netflix-content-data-analysis.rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

Housing-Price.pdf

Housing-Price.pdf

Insurance Cost.Rmd

Insurance Cost.Rmd

Insurance-Cost.html

Insurance-Cost.html

Netflix Content output.pdf

Netflix Content output.pdf

README.md

README.md

housing-prediction.rmd

housing-prediction.rmd

insurance.csv

insurance.csv

netflix_titles.csv

netflix_titles.csv

zhiwei-netflix-content-data-analysis.rmd

zhiwei-netflix-content-data-analysis.rmd

Repository files navigation

Datasets Analysis in R

Description:

Table of contents:

Zhiwei's Portfolio Projects:

Netflix movies and TV shows exploratory data analysis:

Insurance Cost Regression Analysis:

Housing Price Prediction:

About

Releases

Packages

Languages

zhiweilin27/Data-analysis-in-R

Folders and files

Latest commit

History

Repository files navigation

Datasets Analysis in R

Description:

Table of contents:

Zhiwei's Portfolio Projects:

Netflix movies and TV shows exploratory data analysis:

Insurance Cost Regression Analysis:

Housing Price Prediction:

About

Topics

Resources

Stars

Watchers

Forks

Languages