This is a compilation of the deliverables of the 5 courses that are part of this specialization, each one organized in its corresponding folder:
-
Introduction to Data Science in Python
-
Applied Plotting, Charting & Data Representation in Python
-
Applied Machine Learning in Python
-
Applied Text Mining in Python
-
Applied Social Network Analysis in Python
In this course I learned: Understand techniques such as lambdas and manipulating csv files, Describe common Python functionality and features used for data science, Query DataFrame structures for cleaning and processing, Explain distributions, sampling, and t-tests
Skills: Python Programming, Numpy, Pandas, Data Cleansing
Final Assignment: what is the win/loss ratio's correlation with the population of the city it is in? The last part of this course is to find the correlation between the wins of a team with its population, this information is extracted from Wikipedia, and then a process of data cleansing had to be done to correct team names or remove characters that would then unify tables to answer the questions.
In this course I learned: Describe what makes a good or bad visualization, Understand best practices for creating basic charts, Identify the functions that are best for particular problems, Create a visualization using matplotlb
Skills: Python Programming, Data Visualization, Matplotlib
Final Assignment: Personal Project: What is the relationship between the mortality rate of children under 5 years of age in the 5 countries with the highest populations in the world? I needed to search two databases and propose a question that could be answered through a visualization. This visualization was to follow the best practices seen throughout the course. It was decided to work with Unicef databases in order to answer the question. The results could be reviewed through its folder.
Chart made by me taken from UNICEF open data
This graph provides an approximation of the distribution of the world population of children under 5 years of age in 2021. The first places correspond to India, China and Nigeria, mostly Asian and African countries.
On the other hand, the mortality rate of children under 5 years of age in these countries can be represented by the color of the bar, noting that the countries in the ranking from the African continent are the ones with the most intense color.
Therefore, by making a deduction from the graph it can be seen that underdeveloped countries that lack a good health system, coupled with the poverty they suffer and access to education (sex education) have a higher mortality rate in children under 5 years compared to more developed countries such as China, USA, Brazil and even India.
In this course I learned: Describe how machine learning is different than descriptive statistics, Create and evaluate data clusters ,Explain different approaches for creating predictive models , Build features that meet analysis needs
Skills: Python Programming, Machine Learning (ML) Algorithms, Scikit-Learn
Final Assignment: In this assignment, two machine learning models have been implemented: Linear regression and Gradien Bost Regressor to predict the average engagement of the videos on the VideoLectures.Net platform. for this assignment I decided to go deepen this task to the original database and apply data cleansing to the database.
One critical property of a video is engagement: how interesting or "engaging" it is for viewers, so that they decide to keep watching. Engagement is critical for learning, whether the instruction is coming from a video or any other source. There are many ways to define engagement with video, but one common approach is to estimate it by measuring how much of the video a user watches. If the video is not interesting and does not engage a viewer, they will typically abandon it quickly, e.g. only watch 5 or 10% of the total.
The original owner of the dataset and the explanation about it can be found in the following Link: https://github.com/sahanbull/VLE-Dataset/tree/master
The notebook can be found also in Kaggle when was made in firts time: https://www.kaggle.com/felipemantilla77/vle-dataset-an-analizing-and-prediction-task