-> https://linkedin.com/in/arjuaman
In this regression task we will predict the percentage of marks that a student is expected to score based upon the number of hours they studied. This is a simple linear regression task as it involves just two variables.
Data can be found at http://bit.ly/w-data
Question: What will be predicted score if a student study for 9.25 hrs in a day?
Answer: After analyzing the data and breaking in into train-test split, and plotting, we can the linear pattern:
Upon doing the EDA and training on all the training data, we have:
Hence, If a student studies for 9.25 hours a day, percentage he'd score: 95.35 %
Question: From the given ‘Iris’ dataset, predict the optimum number of clusters and represent it visually.
Dataset : https://drive.google.com/file/d/11Iq7YvbWZbt8VXjfm06brx6 6b10YiwK-/view?usp=sharing
Answer: Upon doing the EDA we can see from the dataset that:
On further cleaning and outputing the target variables as a function of features:
Now, to find the optimum number of clusters, we use the elbow method.
So we can see that the optimum number of clusters is 3, hence we do the final clustering:
Question: For the given ‘Iris’ dataset, create the Decision Tree classifier and visualize it graphically. The purpose is if we feed any new data to this classifier, it would be able to predict the right class accordingly.
Answer: With the EDA already done in Task 3, I created a Decision Tree Classifier:
It predicted 'Iris-versicolor' when given the mean of features as input, which is correct as can de seen from the EDA part,i.e. 2nd figure of Task 3.