Skip to content

JuanF3/Applied-Data-Science-with-Python

Repository files navigation

Applied Data Science with Python Specialization

This is a compilation of the deliverables of the 5 courses that are part of this specialization, each one organized in its corresponding folder:

  • Introduction to Data Science in Python

  • Applied Plotting, Charting & Data Representation in Python

  • Applied Machine Learning in Python

  • Applied Text Mining in Python

  • Applied Social Network Analysis in Python

1. Introduction to Data Science in Python

In this course I learned: Understand techniques such as lambdas and manipulating csv files, Describe common Python functionality and features used for data science, Query DataFrame structures for cleaning and processing, Explain distributions, sampling, and t-tests

Skills: Python Programming, Numpy, Pandas, Data Cleansing

Final Assignment: what is the win/loss ratio's correlation with the population of the city it is in? The last part of this course is to find the correlation between the wins of a team with its population, this information is extracted from Wikipedia, and then a process of data cleansing had to be done to correct team names or remove characters that would then unify tables to answer the questions.

2 Applied Plotting, Charting & Data Representation in Python

In this course I learned: Describe what makes a good or bad visualization, Understand best practices for creating basic charts, Identify the functions that are best for particular problems, Create a visualization using matplotlb

Skills: Python Programming, Data Visualization, Matplotlib

Final Assignment: Personal Project: What is the relationship between the mortality rate of children under 5 years of age in the 5 countries with the highest populations in the world? I needed to search two databases and propose a question that could be answered through a visualization. This visualization was to follow the best practices seen throughout the course. It was decided to work with Unicef databases in order to answer the question. The results could be reviewed through its folder.

image

Chart made by me taken from UNICEF open data

This graph provides an approximation of the distribution of the world population of children under 5 years of age in 2021. The first places correspond to India, China and Nigeria, mostly Asian and African countries.

On the other hand, the mortality rate of children under 5 years of age in these countries can be represented by the color of the bar, noting that the countries in the ranking from the African continent are the ones with the most intense color.

Therefore, by making a deduction from the graph it can be seen that underdeveloped countries that lack a good health system, coupled with the poverty they suffer and access to education (sex education) have a higher mortality rate in children under 5 years compared to more developed countries such as China, USA, Brazil and even India.

3 Applied Plotting, Charting & Data Representation in Python

In this course I learned: Describe how machine learning is different than descriptive statistics, Create and evaluate data clusters ,Explain different approaches for creating predictive models , Build features that meet analysis needs

Skills: Python Programming, Machine Learning (ML) Algorithms, Scikit-Learn

Final Assignment: In this assignment, two machine learning models have been implemented: Linear regression and Gradien Bost Regressor to predict the average engagement of the videos on the VideoLectures.Net platform. for this assignment I decided to go deepen this task to the original database and apply data cleansing to the database.

image

One critical property of a video is engagement: how interesting or "engaging" it is for viewers, so that they decide to keep watching. Engagement is critical for learning, whether the instruction is coming from a video or any other source. There are many ways to define engagement with video, but one common approach is to estimate it by measuring how much of the video a user watches. If the video is not interesting and does not engage a viewer, they will typically abandon it quickly, e.g. only watch 5 or 10% of the total.

The original owner of the dataset and the explanation about it can be found in the following Link: https://github.com/sahanbull/VLE-Dataset/tree/master

The notebook can be found also in Kaggle when was made in firts time: https://www.kaggle.com/felipemantilla77/vle-dataset-an-analizing-and-prediction-task

4 Applied Text Mining in Python

About

Here is my assignments for the specialization program of datascience with python of the Michigan University and Coursera

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published