# Machine Learning

**Machine Learning is the science of getting computers to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and information in the form of observations and real-world interactions.
There are many algorithm for getting machines to learn, from using basic decision trees to clustering to layers of artificial neural networks depending on what task you’re trying to accomplish and the type and amount of data that you have available.  
**

**There are three types of machine learning** 
1. Supervised Machine Learning 
2. Unsupervised Machine Learning 
3. Reinforcement Machine Learning 

# Supervised Machine Learning
 **It is a type of learning in which both input and desired output data are provided. Input and output data are labeled for classification to provide a learning basis for future data processing.This algorithm consist of a target / outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using these set of variables, we generate a function that map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data.**


# Application of Supervised Machine Learning 
1. Bioinformatics 
2. Quantitative structure 
3. Database marketing 
4. Handwriting recognition 
5. Information retrieval 
6. Learning to rank 
7. Information extraction 
8. Object recognition in computer vision 
9. Optical character recognition 
10. Spam detection 
11. Pattern recognition 



# Application of Unsupervised Machine Learning 
1. Human Behaviour Analysis 
2. Social Network Analysis to define groups of friends. 
3. Market Segmentation of companies by location, industry, vertical. 
4. Organizing computing clusters based on similar event patterns and processes. 


# Application of Reinforcement Machine Learning 
1. Resources management in computer clusters 
2. Traffic Light Control 
3. Robotics 
4. Web System Configuration 
5. Personalized Recommendations 
6. Deep Learning 


# We can apply machine learning model by following six steps:-
1. Problem Definition 
2. Analyse Data 
3. Prepare Data 
4. Evaluate Algorithm 
5. Improve Results 
6. Present Results 


# Factors help to choose algorithm 
1. Type of algorithm 
2. Parametrization 
3. Memory size 
4. Overfitting tendency 
5. Time of learning 
6. Time of predicting

# Linear Regression 
**It is a basic and commonly used type of predictive analysis. These regression estimates are used to explain the relationship between one dependent variable and one or more independent variables. 
Y = a + bX where **
* Y – Dependent Variable 
* a – intercept 
* X – Independent variable 
* b – Slope 

**Example: University GPA' = (0.675)(High School GPA) + 1.097**

**Library and Data **

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
train = pd.read_csv("../input/random-linear-regression/train.csv") 
test = pd.read_csv("../input/random-linear-regression/train.csv") 
train = train.dropna()
test = test.dropna()
train.head()

**Model with plots and accuracy**

In [None]:
X_train = np.array(train.iloc[:, :-1].values)
y_train = np.array(train.iloc[:, 1].values)
X_test = np.array(test.iloc[:, :-1].values)
y_test = np.array(test.iloc[:, 1].values)
model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = model.score(X_test, y_test)

plt.plot(X_train, model.predict(X_train), color='green')
plt.show()
print(accuracy)

# Logistic Regression 
**It’s a classification algorithm, that is used where the response variable is categorical. The idea of Logistic Regression is to find a relationship between features and probability of particular outcome.**   
* odds= p(x)/(1-p(x)) = probability of event occurrence / probability of not event occurrence 

**Example- When we have to predict if a student passes or fails in an exam when the number of hours spent studying is given as a feature, the response variable has two values, pass and fail. 
**

**Libraries and data**

In [None]:
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import r2_score
from statistics import mode


train = pd.read_csv("../input/titanic/train.csv")
test  = pd.read_csv('../input/titanic/test.csv')
train.head()

In [None]:
ports = pd.get_dummies(train.Embarked , prefix='Embarked')
train = train.join(ports)
train.drop(['Embarked'], axis=1, inplace=True)
train.Sex = train.Sex.map({'male':0, 'female':1})
y = train.Survived.copy()
X = train.drop(['Survived'], axis=1) 
X.drop(['Cabin'], axis=1, inplace=True) 
X.drop(['Ticket'], axis=1, inplace=True) 
X.drop(['Name'], axis=1, inplace=True) 
X.drop(['PassengerId'], axis=1, inplace=True)
X.Age.fillna(X.Age.median(), inplace=True) 


**Model and Accuracy**

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=5)
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(max_iter = 500000)
model.fit(X_train, y_train)
model.score(X_train, y_train)


# Support Vector Machine 
**Support Vector Machines are perhaps one of the most popular and talked about machine learning algorithms.It is primarily a classier method that performs classification tasks by constructing hyperplanes in a multidimensional space that separates cases of different class labels. SVM supports both regression and classification tasks and can handle multiple continuous and categorical variables 
**

**Example: One class is linearly separable from the others like if we only had two features like Height and Hair length of an individual, we’d first plot these two variables in two dimensional space where each point has two co-ordinates **

**Libraries and Data**

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC
data_svm = pd.read_csv("../input/svm-classification/UniversalBank.csv")
data_svm.head()

**Model and Accuracy**

In [None]:
X = data_svm.iloc[:,1:13].values
y = data_svm.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
accuracies.mean()

# Naive Bayes Algorithm 
**A naive Bayes classifier is not a single algorithm, but a family of machine learning algorithms which use probability theory to classify data with an assumption of independence between predictors It is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods    
**

**Example: Emails are given and we have to find the spam emails from that.A spam filter looks at email messages for certain key words and puts them in a spam folder if they match.**

**Libraries and Data**

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
data = pd.read_csv('../input/classification-suv-dataset/Social_Network_Ads.csv')
data_nb = data
data_nb.head()

**Model and Accuracy**

In [None]:
X = data_nb.iloc[:, [2,3]].values
y = data_nb.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
classifier=GaussianNB()
classifier.fit(X_train,y_train)
y_pred=classifier.predict(X_test)
acc=accuracy_score(y_test, y_pred)
print(acc)

# KNN 
**KNN does not learn any model. and stores the entire training data set which it uses as its representation.The output can be calculated as the class with the highest frequency from the K-most similar instances. Each instance in essence votes for their class and the class with the most votes is taken as the prediction 
**

**Example: Should the bank give a loan to an individual? Would an individual default on his or her loan? Is that person closer in characteristics to people who defaulted or did not default on their loans? **


**Libraries and Data**

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = data
knn.head()

**Model and Accuracy**

In [None]:
X = knn.iloc[:, [2,3]].values
y = knn.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
classifier=KNeighborsClassifier(n_neighbors=5,metric='minkowski',p=2)
classifier.fit(X_train,y_train)
y_pred=classifier.predict(X_test)
acc=accuracy_score(y_test, y_pred)
print(acc)

# Random Forest 
**Random forest is collection of tress(forest) and it builds multiple decision trees and merges them together to get a more accurate and stable prediction.It can be used for both classification and regression problems.**

**Example: Suppose we have a bowl of 100 unique numbers from 0 to 99. We want to select a random sample of numbers from the bowl. If we put the number back in the bowl, it may be selected more than once. 
**

**Libraries and Data**

In [None]:
from sklearn.ensemble import RandomForestClassifier
rf = data
rf.head()

**Model and Accuracy**

In [None]:
X = rf.iloc[:, [2,3]].values
y = rf.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.15, random_state = 0)
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
classifier=RandomForestClassifier(n_estimators=1000,criterion='entropy',random_state=0)
classifier.fit(X_train,y_train)
y_pred=classifier.predict(X_test)
acc=accuracy_score(y_test, y_pred)
print(acc)

# Decision Tree
**Decision tree algorithm is classification algorithm under supervised machine learning and it is simple to understand and use in data.The idea of Decision tree is to split the big data(root) into smaller(leaves)**

In [None]:
from sklearn.tree import DecisionTreeClassifier
dt = data
dt.head()

In [None]:
X = dt.iloc[:, [2,3]].values
y = dt.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
classifier=DecisionTreeClassifier(criterion="entropy",random_state=0)
classifier.fit(X_train,y_train)
y_pred=classifier.predict(X_test)
acc=accuracy_score(y_test, y_pred)
print(acc)

# Gradient Boosting
**Gradient boosting is an alogithm under supervised machine learning, boosting means converting weak into strong. In this new tree is boosted over the previous tree**

**Libraries and Data**

In [None]:
from sklearn.ensemble import GradientBoostingClassifier
gb = data
gb.head()

**Model and Accuracy**

In [None]:
X = gb.iloc[:, [2,3]].values
y = gb.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
gbk = GradientBoostingClassifier()
gbk.fit(X_train, y_train)
pred = gbk.predict(X_test)
acc=accuracy_score(y_test, y_pred)
print(acc)

# Light GBM

**LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:**

1. Faster training speed and higher efficiency.
2. Lower memory usage.
3. Better accuracy.
4. Support of parallel and GPU learning.
5. Capable of handling large-scale data.

**Library and Data**

In [None]:
import lightgbm as lgbm
import lightgbm as lgb

from sklearn.model_selection import KFold, GridSearchCV
from sklearn import preprocessing


train = pd.read_csv("../input/house-prices-advanced-regression-techniques/train.csv")
test = pd.read_csv("../input/house-prices-advanced-regression-techniques/test.csv")
data = pd.concat([train, test], sort=False)
data = data.reset_index(drop=True)
data.head()

**Preprocessing**

In [None]:
nans=pd.isnull(data).sum()

data['MSZoning']  = data['MSZoning'].fillna(data['MSZoning'].mode()[0])
data['Utilities'] = data['Utilities'].fillna(data['Utilities'].mode()[0])
data['Exterior1st'] = data['Exterior1st'].fillna(data['Exterior1st'].mode()[0])
data['Exterior2nd'] = data['Exterior2nd'].fillna(data['Exterior2nd'].mode()[0])

data["BsmtFinSF1"]  = data["BsmtFinSF1"].fillna(0)
data["BsmtFinSF2"]  = data["BsmtFinSF2"].fillna(0)
data["BsmtUnfSF"]   = data["BsmtUnfSF"].fillna(0)
data["TotalBsmtSF"] = data["TotalBsmtSF"].fillna(0)
data["BsmtFullBath"] = data["BsmtFullBath"].fillna(0)
data["BsmtHalfBath"] = data["BsmtHalfBath"].fillna(0)
data["BsmtQual"] = data["BsmtQual"].fillna("None")
data["BsmtCond"] = data["BsmtCond"].fillna("None")
data["BsmtExposure"] = data["BsmtExposure"].fillna("None")
data["BsmtFinType1"] = data["BsmtFinType1"].fillna("None")
data["BsmtFinType2"] = data["BsmtFinType2"].fillna("None")

data['KitchenQual']  = data['KitchenQual'].fillna(data['KitchenQual'].mode()[0])
data["Functional"]   = data["Functional"].fillna("Typ")
data["FireplaceQu"]  = data["FireplaceQu"].fillna("None")

data["GarageType"]   = data["GarageType"].fillna("None")
data["GarageYrBlt"]  = data["GarageYrBlt"].fillna(0)
data["GarageFinish"] = data["GarageFinish"].fillna("None")
data["GarageCars"] = data["GarageCars"].fillna(0)
data["GarageArea"] = data["GarageArea"].fillna(0)
data["GarageQual"] = data["GarageQual"].fillna("None")
data["GarageCond"] = data["GarageCond"].fillna("None")

data["PoolQC"] = data["PoolQC"].fillna("None")
data["Fence"]  = data["Fence"].fillna("None")
data["MiscFeature"] = data["MiscFeature"].fillna("None")
data['SaleType']    = data['SaleType'].fillna(data['SaleType'].mode()[0])
data['LotFrontage'].interpolate(method='linear',inplace=True)
data["Electrical"]  = data.groupby("YearBuilt")['Electrical'].transform(lambda x: x.fillna(x.mode()[0]))
data["Alley"] = data["Alley"].fillna("None")

data["MasVnrType"] = data["MasVnrType"].fillna("None")
data["MasVnrArea"] = data["MasVnrArea"].fillna(0)
nans=pd.isnull(data).sum()
nans[nans>0]

In [None]:
_list = []
for col in data.columns:
    if type(data[col][0]) == type('str'): 
        _list.append(col)

le = preprocessing.LabelEncoder()
for li in _list:
    le.fit(list(set(data[li])))
    data[li] = le.transform(data[li])

train, test = data[:len(train)], data[len(train):]

X = train.drop(columns=['SalePrice', 'Id']) 
y = train['SalePrice']

test = test.drop(columns=['SalePrice', 'Id'])

**Model and Accuracy**

In [None]:
kfold = KFold(n_splits=5, random_state = 2020, shuffle = True)

model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=5,
                              learning_rate=0.05, n_estimators=720,
                              max_bin = 55, bagging_fraction = 0.8,
                              bagging_freq = 5, feature_fraction = 0.2319,
                              feature_fraction_seed=9, bagging_seed=9,
                              min_data_in_leaf =6, min_sum_hessian_in_leaf = 11)
model_lgb.fit(X, y)
r2_score(model_lgb.predict(X), y)


# **LDA**

**A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.Itis  used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.**

**Library and Data**

In [None]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = data
lda.head()

**Model and Accuracy**

In [None]:
X = gb.iloc[:, [2,3]].values
y = gb.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
Model=LinearDiscriminantAnalysis()
Model.fit(X_train,y_train)
y_pred=Model.predict(X_test)
print('accuracy is ',accuracy_score(y_pred,y_test))

# K-Means Algorithm 
K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data and the goal of this algorithm is to find groups in the data 

**Steps to use this algorithm:-**
* 1-Clusters the data into k groups where k is predefined. 
* 2-Select k points at random as cluster centers. 
* 3-Assign objects to their closest cluster center according to the Euclidean distance function. 
* 4-Calculate the centroid or mean of all objects in each cluster. 

**Examples: Behavioral segmentation like segment by purchase history or by activities on application, website, or platform Separate valid activity groups from bots  **


**Libraries and Data**

In [None]:
from sklearn.cluster import KMeans
km = pd.read_csv("../input/k-mean/km.csv")
km.head()

**Checking for number of clusters**

In [None]:
K_clusters = range(1,8)
kmeans = [KMeans(n_clusters=i) for i in K_clusters]
Y_axis = km[['latitude']]
X_axis = km[['longitude']]
score = [kmeans[i].fit(Y_axis).score(Y_axis) for i in range(len(kmeans))]
plt.plot(K_clusters, score)
plt.xlabel('Number of Clusters')
plt.ylabel('Score')
plt.show()

**Fitting Model**

In [None]:
kmeans = KMeans(n_clusters = 3, init ='k-means++')
kmeans.fit(km[km.columns[1:3]])
km['cluster_label'] = kmeans.fit_predict(km[km.columns[1:3]])
centers = kmeans.cluster_centers_
labels = kmeans.predict(km[km.columns[1:3]])
km.cluster_label.unique()

**Plotting Clusters**

In [None]:
km.plot.scatter(x = 'latitude', y = 'longitude', c=labels, s=50, cmap='viridis')
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=100, alpha=0.5)


# CNN

**Library and Data**

In [None]:
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D
import tensorflow as tf
train_data = pd.read_csv("../input/digit-recognizer/train.csv")
test_data = pd.read_csv("../input/digit-recognizer/test.csv")
train_data.head()

**Preprocessing and Data Split**

In [None]:
X = np.array(train_data.drop("label", axis=1)).astype('float32')
y = np.array(train_data['label']).astype('float32')
for i in range(9):
    plt.subplot(3,3,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X[i].reshape(28, 28), cmap=plt.cm.binary)
    plt.xlabel(y[i])
plt.show()

X = X / 255.0
X = X.reshape(-1, 28, 28, 1)
y = to_categorical(y)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
X_test = np.array(test_data).astype('float32')
X_test = X_test / 255.0
X_test = X_test.reshape(-1, 28, 28, 1)
plt.figure(figsize=(10,10))


**Model**

In [None]:
model = Sequential()
model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu', input_shape = (28,28,1)))
model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation = "relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation = "softmax"))
model.summary()


**Compiling model**

In [None]:
#increse to epochs to 30 for better accuracy
model.compile(optimizer='adam', loss="categorical_crossentropy", metrics=["accuracy"])
history = model.fit(X_train, y_train, epochs=10, batch_size=85, validation_data=(X_val, y_val))

In [None]:
accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']
epochs = range(len(accuracy))
plt.plot(epochs, accuracy, 'bo', label='Training accuracy')
plt.plot(epochs, val_accuracy, 'b', label='Validation accuracy')
plt.show()

print(model.evaluate(X_val, y_val))


In [None]:
prediction = model.predict_classes(X_test)
submit = pd.DataFrame(prediction,columns=["Label"])
submit["ImageId"] = pd.Series(range(1,(len(prediction)+1)))
submission = submit[["ImageId","Label"]]
submission.to_csv("submission.csv",index=False)


# Prophet


Prophet is an extremely easy tool for analysts to produce reliable forecasts

1. Prophet only takes data as a dataframe with a ds (datestamp) and y (value we want to forecast) column. So first, let’s convert the dataframe to the appropriate format.
1. Create an instance of the Prophet class and then fit our dataframe to it.
2. Create a dataframe with the dates for which we want a prediction to be made with make_future_dataframe(). Then specify the number of days to forecast using the periods parameter.
3. Call predict to make a prediction and store it in the forecast dataframe. What’s neat here is that you can inspect the dataframe and see the predictions as well as the lower and upper boundaries of the uncertainty interval.


**Library and Data**

In [None]:
import plotly.offline as py
import plotly.express as px
from fbprophet import Prophet
from fbprophet.plot import plot_plotly, add_changepoints_to_plot

pred = pd.read_csv("../input/coronavirus-2019ncov/covid-19-all.csv")
pred = pred.fillna(0)
predgrp = pred.groupby("Date")[["Confirmed","Recovered","Deaths"]].sum().reset_index()
pred_cnfrm = predgrp.loc[:,["Date","Confirmed"]]
pr_data = pred_cnfrm
pr_data.columns = ['ds','y']
pr_data.head()

**Model and Forecast**

In [None]:
m=Prophet()
m.fit(pr_data)
future=m.make_future_dataframe(periods=15)
forecast=m.predict(future)
forecast


In [None]:
fig = plot_plotly(m, forecast)
py.iplot(fig) 

fig = m.plot(forecast,xlabel='Date',ylabel='Confirmed Count')

# Arima

**Library and Data**

In [None]:
import datetime
from statsmodels.tsa.arima_model import ARIMA
ar = pd.read_csv("../input/competitive-data-science-predict-future-sales/sales_train.csv")
ar.date=ar.date.apply(lambda x:datetime.datetime.strptime(x, '%d.%m.%Y'))
ar=ar.groupby(["date_block_num"])["item_cnt_day"].sum()
ar.index=pd.date_range(start = '2013-01-01',end='2015-10-01', freq = 'MS')
ar=ar.reset_index()
ar=ar.loc[:,["index","item_cnt_day"]]
ar.columns = ['confirmed_date','count']
ar.head()

**Model**

In [None]:
model = ARIMA(ar['count'].values, order=(1, 2, 1))
fit_model = model.fit(trend='c', full_output=True, disp=True)
fit_model.summary()

**Prediction**

In [None]:
fit_model.plot_predict()
plt.title('Forecast vs Actual')
pd.DataFrame(fit_model.resid).plot()
forcast = fit_model.forecast(steps=6)
pred_y = forcast[0].tolist()
pred = pd.DataFrame(pred_y)


# **Evaluate Algorithms** 
**The evaluation of algorithm consist three following steps:- **
1. Test Harness  
2. Explore and select algorithms 
3. Interpret and report results 



# Data Analytics


We will do detailed analysis on campus placement dataset 

In [None]:
cp = pd.read_csv("../input/factors-affecting-campus-placement/Placement_Data_Full_Class.csv")
cp = cp.fillna(0)
cp.head()

# Bar Plot

**Plotting bar graph of gender vs salary where specialisation is feature**

In [None]:
import plotly.express as px
grgs = cp.groupby(["gender","specialisation"])[["salary"]].mean().reset_index()
fig = px.bar(grgs[['gender', 'salary','specialisation']].sort_values('salary', ascending=False), 
             y="salary", x="gender", color='specialisation', 
             log_y=True, template='ggplot2')
fig.show()

# Pie Plot

**Plotting Pie Chart of Degree and percentage is a feature**

In [None]:
grdsp = cp.groupby(["degree_t"])[["degree_p"]].mean().reset_index()

fig = px.pie(grdsp,
             values="degree_p",
             names="degree_t",
             template="seaborn")
fig.update_traces(rotation=90, pull=0.05, textinfo="percent+label")
fig.show()

# Tree Plot

**Plotting tree Chart of high secondary stream and percentage is a feature**

In [None]:
grss = cp.groupby(["hsc_s"])[["hsc_p"]].mean().reset_index()

fig = px.treemap(grss, path=['hsc_s'], values='hsc_p',
                  color='hsc_p', hover_data=['hsc_s'],
                  color_continuous_scale='rainbow')
fig.show()

# Scatter Plot

**Scatter Plot show the degree and salary**

In [None]:
plt.scatter(cp.degree_t,cp.salary)

# Line Chart

Line chart show the degree and percentage in secondary, senior secondary and degree

In [None]:
grd = cp.groupby(["hsc_s"])[["hsc_p","ssc_p","degree_p"]].mean().reset_index()
f, ax = plt.subplots(figsize=(100, 30))

plt.plot(grd.hsc_s,grd.hsc_p,color="blue")
plt.plot(grd.hsc_s,grd.ssc_p,color="black")
plt.plot(grd.hsc_s,grd.degree_p,color="red")
plt.xticks(fontsize=50)
plt.yticks(fontsize=50)

# **Density Plot**

**The density plot show the distribution of salary**

In [None]:
import seaborn as sns
plt.figure(figsize=(10,6))
sns.set_style("darkgrid")
sns.kdeplot(data=cp['salary'],label="Salary" ,shade=True)

In [None]:
!pip install chart_studio


In [None]:
pip install bubbly


# **Bubble Plot**

In [None]:
from bubbly.bubbly import bubbleplot 
from plotly.offline import iplot
import chart_studio.plotly as py
m = pd.read_csv("../input/global-hospital-beds-capacity-for-covid19/hospital_beds_USA_v1.csv")


figure = bubbleplot(dataset=m, x_column='beds', y_column='population', 
    bubble_column='state', size_column='beds', color_column='type', 
    x_logscale=True, scale_bubble=2, height=350)

iplot(figure)

# Heat Map

In [None]:
import seaborn as sns
f, ax = plt.subplots(figsize=(15,2))
h=pd.pivot_table(cp,columns='sl_no',values=["salary"])
sns.heatmap(h,cmap=['skyblue','red','green'],linewidths=0.05)


# **Folium Map**

In [None]:
m = pd.read_csv("../input/global-hospital-beds-capacity-for-covid19/hospital_beds_USA_v1.csv")

import folium
map = folium.Map(location=[37.0902,-95.7129 ], zoom_start=4,tiles='cartodbpositron')

for lat, lon,state,type in zip(m['lat'], m['lng'],m['state'],m['type']):
    folium.CircleMarker([lat, lon],
                        radius=5,
                        color='red',
                      popup =(
                    'State: ' + str(state) + '<br>'),

                        fill_color='red',
                        fill_opacity=0.7 ).add_to(map)
map

# Choropleth

In [None]:
fig = px.choropleth(m, locations=m["state"],       

 color=m["beds"],
                    locationmode="USA-states",
                    scope="usa",
                    color_continuous_scale='Reds',
                   )

fig.show()