Subject: Project related to various topics of machine learning The source code for following code:
- Assignment on Regression technique Download temperature data from below link. https://www.kaggle.com/venky73/temperaturesof-india?select=temperatures.csv. This data consists of temperatures of INDIA averaging the temperatures of all places month wise. Temperatures values are recorded in CELSIUS. A] Apply Linear Regression using suitable library function and predict the Month-wise temperature. B.] Assessthe performance of regression models using MSE, MAE and R-Square metrics C. Visualize simple regression model.
- Assignment on Classification technique Every year many students give the GRE exam to get admission in foreign Universities. The data set contains GRE Scores (out of 340), TOEFL Scores (out of 120), University Rating (out of 5), Statement of Purpose strength (out of 5), Letter of Recommendation strength (out of 5), Undergraduate GPA (out of 10), Research Experience (0=no, 1=yes), Admitted (0=no, 1=yes). Admitted is the target variable. Data Set Available on kaggle (The last column of the dataset needs to be changed to 0 or 1)Data Set : https://www.kaggle.com/mohansacharya/graduate-admissions The counselor of the firm is supposed check whether the student will get an admission or not based on his/her GRE score and Academic Score. So to help the counselor to take appropriate decisions build a machine learning model classifier using Decision tree to predict whether a student will get admission or not. Apply Data pre-processing (Label Encoding, Data Transformation….) techniques if necessary. Perform data-preparation ( Train-Test Split)
- Assignment on Improving Performance of Classifier Models A SMS unsolicited mail (every now and then known as cell smartphone junk mail) is any junk message brought to a cellular phone as textual content messaging via the Short Message Service (SMS). Use probabilistic approach (Naive Bayes Classifier / Bayesian Network)to implement SMS Spam Filtering system. SMS messages are categorized as SPAM or HAM using features like length of message, word depend, unique keywords etc. Download Data -Set from : http://archive.ics.uci.edu/ml/datasets/sms+spam+collection This dataset is composed by just one text file, where each line has the correct class followed by the raw message. A. Apply Data pre-processing (Label Encoding, Data Transformation….) techniques if necessary B. Perform data-preparation (Train-Test Split) C. Apply at least two Machine Learning Algorithms and Evaluate Models D. Apply Cross-Validation and Evaluate Models and compare performance. E. Apply Hyper parameter tuning and evaluate models and compare performance.
- Assignment on Clustering Techniques Download the following customer dataset from below link: Data Set: https://www.kaggle.com/shwetabh123/mall-customers This dataset gives the data of Income and money spent by the customers visiting a Shopping Mall. The data set contains Customer ID, Gender, Age, Annual Income, Spending Score. Therefore, as a mall owner you need to find the group of people who are the profitable customers for the mall owner. Apply at least two clustering algorithms (based on Spending Score) to find the group of customers. A. Apply Data pre-processing (Label Encoding , Data Transformation….) techniques if necessary. B. Perform data-preparation ( Train-Test Split) C. Apply Machine Learning Algorithm D. Evaluate Model. E. Apply Cross-Validation and Evaluate Model
- Assignment on Association Rule Learning Download Market Basket Optimization dataset from below link. Data Set: https://www.kaggle.com/hemanthkumar05/market-basket-optimization This dataset comprises the list of transactions of a retail company over the period of one week. It contains a total of 7501 transaction records where each record consists of the list of items sold in one transaction. Using this record of transactions and items in each transaction, find the association rules between items. There is no header in the dataset and the first row contains the first transaction, so mentioned header = None here while loading dataset. A. Follow following steps : B. Data Preprocessing C. Generate the list of transactions from the dataset D. Train Apriori algorithm on the dataset