# <p style="text-align: center;">Countries_of_the_World_EDA </p>

In [None]:
from IPython.display import HTML
from IPython.display import Image
Image(url= "https://upload.wikimedia.org/wikipedia/commons/b/b4/2002_six-color_world_political_map.png")

### ABSTRACT   
##### [Reference](#1)

In this Kernel , The following dataset "Countries of the World" by Fernando Lasso has been analyzed. The main focus of this project is GDP (Gross Domestic Product), factors that affects GDP per capita and on the basis of the effects trying to create a model , which uses the data of 227 countries from the given dataset. Also in the following project there is a brief explanation of how total GDPs is related with all the factors. The key methods used for analysis of data is Correlation and Linear Regression. Our key findings leads us to know that GDP per capita is highly correlated with the factors such Literacy, Phones, Service, Infant mortality, Birthrate and Agriculture. 
This project is a good practice for EDA and visualization. 
Exploratory Data Analysis (EDA) is the first step in your data analysis process.we take a broad look at patterns, trends, outliers, unexpected results and so on in the dataset, using visual and quantitative methods to get a sense of the story it tells. 

In [None]:
from IPython.core.display import HTML
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')

In [None]:
# importing libraries 
%matplotlib inline 
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats
import seaborn as sns
import re
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_log_error
from sklearn.metrics import mean_squared_error
from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

In [None]:
# importing the dataset
df=pd.read_csv('../input/countries of the world.csv', decimal = ',')

### The following table shows the first five rows of the given dataset, thereby giving us insight about what sort of dataset it is. And what are the attributes included in the dataset.

In [None]:
#first 5 rows of the data set to see what sort of data is there
df.head()

### Statistcal analysis of given dataset

In [None]:
#statistcal analysis of given data set
df.describe()

### Table Overview

We are checking whether are dataset has any missing values . If it results in true then it does otherwise it doesn't.

In [None]:
# Checking for null values
print('Dataset has null values?')
df.isnull().values.any()

#### Now that we know that our dataset has missing values, we need to find the columns which has those values alongwith, the percentage effect it has with respect to whole dataset.

Table Overview:- Following table gives us the column names with the number of missing values and percentage effect it has with respect to dataset

In [None]:
#Finding missing values in the data set 
total = df.isnull().sum()[df.isnull().sum() != 0].sort_values(ascending = False)
percent = pd.Series(round(total/len(df)*100,2))
pd.concat([total, percent], axis=1, keys=['total_missing', 'percent'])


### HOW TO FIND THE MISSING VALUES 

We know that our Dataset contains missing values and we don't have any idea what collective effect it has on whole of dataset , as in how it changes the distribution and by what percentage. Though we can ignore the missing values and run our analysis. But this wholly depends on the collective effect the missing values has on dataset and since we don't have any idea about that it is better if we fill in the missing values and then do our analysis as it will give us better and complete results.

Inorder to find the missing values, we need to know what sort of distribution our dependent variable has.

Plot overview:- 
The following Barplot gives us the distribution of "GDP per capita" with "Countries". By the plots we will be able to know what measure of central tendency we should use in order to fill in our missing values.

1. Plot 1:- Top 33 Countries vs GDP per capita
2. Plot 2:- Last 33 Countries vs GDP per capita

In [None]:
#Sorting the values of GDP for different countries in descending order
top_gdp_countries = df.sort_values('GDP ($ per capita)',ascending=False)
#Visual Representation of the graph using seaborn for first 33 values
fig, ax = plt.subplots(figsize=(16,6))
sns.barplot(x='Country', y='GDP ($ per capita)', data=top_gdp_countries.head(33), palette='Set1')
ax.set_title('Top 33 Countries vs GDP per capita')
ax.set_xlabel(ax.get_xlabel(), labelpad=15)
ax.set_ylabel(ax.get_ylabel(), labelpad=30)
ax.xaxis.label.set_fontsize(16)
ax.yaxis.label.set_fontsize(16)
plt.xticks(rotation=90)
plt.show()


In [None]:
#Visual Representation of the graph using seaborn for last 33 values 
fig, ax = plt.subplots(figsize=(16,6))
sns.barplot(x='Country', y='GDP ($ per capita)', data=top_gdp_countries.tail(33), palette='Set1')
ax.set_title('Last 33 Countries vs GDP per capita')
ax.set_xlabel(ax.get_xlabel(), labelpad=15)
ax.set_ylabel(ax.get_ylabel(), labelpad=30)
ax.xaxis.label.set_fontsize(16)
ax.yaxis.label.set_fontsize(16)
plt.xticks(rotation=90)
plt.show()

#

### Which mode of central tendency we are using to fill in missing values?
Reasoning:- By looking at both plot we can say that the distribution for GDP per capita with respect to countries is rightly skewed. And for the distributions that are generally skewed and not normal we use median as a measure of central tendency. Mean is not used because it is greatly affected by outliers which would may result in mean being skewed between outliers but median retains its position and is not as strongly influenced by the skewed values. And as for mode, it is generally used for categorical data . Also in mode's case , two or more datapoints have same frequency, as in mode may have more than one value. So, we use median to fill in our missing values.

We are going to group data by region . Since, regions are areas that are broadly divided by physical characteristics (physical geography), human impact characteristics (human geography), and the interaction of humanity and the environment (environmental geography) and consists of land and/or countries which have similar attributes. So we have grouped independent variables i.e attributes together by region and calculated the median for the same. 

And as for the climate , we use mode because it is a categorical data and mean and median won't make much sense for filling in the missing values.


In [None]:
df.groupby('Region')[['GDP ($ per capita)', 'Literacy (%)', 'Agriculture']].median()

### Missing values being filled in columns

In [None]:
#Missing values being filled in columns
for col in df.columns.values:
    if df[col].isnull().sum() == 0:
        continue
    if col == 'Climate':
        guess_values = df.groupby('Region')['Climate'].apply(lambda x: x.mode().max())
    else:
        guess_values = df.groupby('Region')[col].median()
    for region in df['Region'].unique():
        df[col].loc[(df[col].isnull())&(df['Region']==region)] = guess_values[region]

In [None]:
print('Are there anymore null values?')

In [None]:
df.isnull().values.any()

Table Overview:- Showing columns with number of missing values

In [None]:
#check if we filled all missing values
print(df.isnull().sum())

Now that we have a definitive dataset, that is one without null values we can employ various machine learning algorithms to see how are dependent and independent variable is related.

 ### 1.Correlation:- 
Correlation is any statistical association, though in common usage it most often refers to how close two variables are to having a linear relationship with each other.
The correlation coefficient r measures the strength and direction of a linear relationship between two variables on a scatterplot. 
if r>0 higher the correlation and if r<0 correlation is inversely related

#### Table Overview:- Gives the Correlated values of each column with each other in a dataframe

In [None]:
#correlation
df.corr()

### Visual representation in form of heatmap for correlated data

In [None]:
#Visual representation in form of heatmap for correlated data
plt.figure(figsize=(16,12))
ax=plt.axes()
sns.heatmap(data=df.iloc[:,2:].corr(),annot=True,fmt='.2f',cmap='coolwarm',ax=ax)
ax.set_title('Heatmap showing correlated values for the Dataset')
plt.show()


### By looking at the heatmap for a given dataset we can say that following factors are positively correlated with GDP per capita:-

### r>0 for:-
1. Literacy (%) - 0.51
2. Phones (per 1000) - 0.83 (Highly correlated)
3. Service - 0.55

### Following values are inversely correlated with GDP per capita:-
### r<0 for:-
1. Infant mortality (per 1000 births) - -0.6
2. Birthrate - -0.64(Highly negatively correlated)
3. Agriculture- -0.59

#### Visual representation in form of heatmap for data that are maximum correlated with GDP per capita

In [None]:
# choose attributes which shows relation
x = df[['GDP ($ per capita)','Literacy (%)','Phones (per 1000)','Service','Infant mortality (per 1000 births)','Birthrate','Agriculture']]

In [None]:
# show corr of the same
plt.figure(figsize=(10,5))
ax=plt.axes()
sns.heatmap(x.corr(), annot=True,ax=ax)
ax.set_title('Heatmap showing correlated values for the Dataset')
plt.show()


### Scatterplot

Scatterplot uses dots to represent the values obtained for two different variable i.e Independent variable vs dependent variable(x vs y), which is (Factors vs GDP per capita) in this case. It basically shows how strongly two variables have linear relationship. 

Since we have some factors which are inversely correlated with GDP per capita , we are going to take the absolute value of correlation coefficient and plot it.

In [None]:
#scatter plot to show correlation between GDP and other attributes
fig, axes = plt.subplots(nrows=3, ncols=3, figsize=(23,20))
plt.subplots_adjust(hspace=0.4)

corr_to_gdp = pd.Series()
for col in df.columns.values[2:]:
    if ((col!='GDP ($ per capita)')&(col!='Climate')&(col!='Coastline (coast/area ratio)') &(col!='Pop. Density (per sq. mi.)')):
        corr_to_gdp[col] = df['GDP ($ per capita)'].corr(df[col])
abs_corr_to_gdp = corr_to_gdp.abs().sort_values(ascending=False)
corr_to_gdp = corr_to_gdp.loc[abs_corr_to_gdp.index]


for i in range(3):
    for j in range(3):
        sns.regplot(x=corr_to_gdp.index.values[i*3+j], y='GDP ($ per capita)', data=df,
                   ax=axes[i,j], fit_reg=False, marker='.')
        title = 'correlation='+str(corr_to_gdp[i*3+j])
        axes[i,j].set_title(title)
axes[1,2].set_xlim(0,102)
fig.suptitle('Scatterplot between GDP per capita and factors', fontsize='30')
plt.show()

### Pairplot
Pairs plot (also known as scatterplot matrix). In a pair plot we can see the distribution for both of the single variables and relationships between two variables(Here GDP per capita, Phones per (1000) and Service). Here we have grouped our data on basis of regions and then plotted it.

The histogram on the diagonal gives us the distribution of a single variable while the scatter plots on the upper and lower triangles show the relationship between two variables.

In [None]:
x = df[['GDP ($ per capita)','Phones (per 1000)','Service','Region']]

g=sns.pairplot(x, hue="Region", diag_kind='hist')
g.fig.suptitle('Pairplot showing GDP per capita, Services and Phones per(1000)',y=1.05)





### CORRELATED ATTRIBUTES

#### 1. Distplot Distribution

#### Positively Correlated Attribute
The following figure gives a plot for density of positively correlated factors (where r>0). And it is a univariate distribution

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(23,20))
plt.subplots_adjust(hspace=0.4)

z = pd.Series()
for col in df.columns.values[2:]:
    if ((col!='Deathrate')&(col!='Net migration')&(col!='Industry')&(col!='Agriculture')&(col!='Birthrate')&(col!='Area (sq. mi.)')&(col!='Population')&(col!='Other (%)')&(col!='Crops (%)')&(col!='Arable (%)')&(col!='Infant mortality (per 1000 births)')&(col!='Climate')&(col!='Coastline (coast/area ratio)') &(col!='Pop. Density (per sq. mi.)')):
      
        colums=np.array(df[col])
        z[col]=colums
#p=z.loc[z.index]
#print (z)

for i in range(2):
    for j in range(2):
        
        #x=z.index.values[i*3+j]
        #sns.barplot(z.index[i*3+j],z.values[i*3+j])
        #x=z.index.values[i*3+j]
        
        y=z.index[i*2+j]
        x=z[i*2+j]
        print(y)
        sns.distplot(x,axlabel=y,ax=axes[i,j])

fig.suptitle('Univariate Distribution of Positively Correlated Factors', fontsize='25')
plt.show()

####  Negatively Correlated Attribute
The following figure gives a plot for density of negatively correlated factors (where r<0). And it is a univariate distribution

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(10,4))
plt.subplots_adjust(hspace=0.1)

z = pd.Series()
for col in df.columns.values[2:]:
     if ((col!='Service')&(col!='Deathrate')&(col!='Net migration')&(col!='Industry')&(col!='Literacy (%)')&(col!='GDP ($ per capita)')&(col!='Area (sq. mi.)')&(col!='Population')&(col!='Other (%)')&(col!='Crops (%)')&(col!='Arable (%)')&(col!='Phones (per 1000)')&(col!='Climate')&(col!='Coastline (coast/area ratio)') &(col!='Pop. Density (per sq. mi.)')):
            
        colums=np.array(df[col])
        z[col]=colums
p=z
#print (p)

for i in range(1):
    for j in range(3):
        y=z.index[j]
        x=z[j]
        #print(x)
        #print(y)
        #print(z[j].size)
        sns.distplot(x,ax=axes[j],axlabel=y)

fig.suptitle('Univariate Distribution of Negatively Correlated Factors', fontsize='20')
plt.show()

#### 2. Boxplot
It is often used in explanatory data analysis in order to show the shape of the distribution, its central value, and its variability. The following figure gives us the boxplot for the first three factors that are highly positively and negatively correlated.

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(23,20))
plt.subplots_adjust(hspace=0.4)

z = pd.Series()
for col in df.columns.values[2:]:
     if ((col!='Deathrate')&(col!='Net migration')&(col!='Industry')&(col!='GDP ($ per capita)')&(col!='Area (sq. mi.)')&(col!='Population')&(col!='Other (%)')&(col!='Crops (%)')&(col!='Arable (%)')&(col!='Climate')&(col!='Coastline (coast/area ratio)') &(col!='Pop. Density (per sq. mi.)')):
        colums=np.array(df[col])
        z[col]=colums

for i in range(2):
    for j in range(3):
        
        x=z.index[i*3+j]
        y=z[i*3+j]
        sns.boxplot(z[i*3+j],ax=axes[i,j])
        title = str(z.index[i*3+j])
        axes[i,j].set_title(title)
        axes[0,0].set_xlim(0,175)

fig.suptitle('Boxplot Distribution for Correlated Attributes', fontsize='30')
      
plt.show()


### Overview of our Dataset

In [None]:
df.head(5)

### Data Modeling using Linear Regression

Since we know from above given Dataset , that two of the columns i.e Region and Climate have non-numeric values , so before we proceed forward we need to convert it into numeric values , so that we can run different machine learning algorithms on it . Inorder to run analysis. For that very reason we use labelencoder.

Label encoder basically encodes categorical values and the technique is called as label encoding. Label encoding  simply converts each value of a column to a number.

Table Overview:- Following table gives us the columns with non numeric values and encoded numeric values for the same

In [None]:
LE = LabelEncoder()
df['Regional_label'] = LE.fit_transform(df['Region'])
df1 = df[['Region','Regional_label']]
df1.head(5)

[Reference](#2)

### Linear Regression

Linear regression is basically a linear approach to model the relationship shared between a scalar response (or dependent variable) i.e GDP per capita and one or more explanatory variables (or independent variables) i.e Literacy rate, services etc in our case. 

### Multiple Linear Regression

#### The case of multiple explanatory variable (independent variable) is called multiple linear regression.
To build a well-performing machine learning (ML) model, it is important to seperate data into training and testing dataset . Basically we are training the model on and testing it against the data that comes from the same set of target distribution. 

In [None]:
train, test = train_test_split(df, test_size=0.3, shuffle=True)
training_features = ['Population', 'Area (sq. mi.)',
       'Pop. Density (per sq. mi.)', 'Coastline (coast/area ratio)',
       'Net migration', 'Infant mortality (per 1000 births)',
       'Literacy (%)', 'Phones (per 1000)',
       'Arable (%)', 'Crops (%)', 'Other (%)', 'Birthrate',
       'Deathrate', 'Agriculture', 'Industry', 'Service', 'Regional_label',
       'Service']
target = 'GDP ($ per capita)'
train_X = train[training_features]
train_Y = train[target]
test_X = test[training_features]
test_Y = test[target]


#### We applied linear regression model on our dataset and calculated the value for Root Mean Squared Error and Mean Squared Error(log).

Root Mean Squared Error:-
Root Mean Square Error (RMSE) mathematically is the standard deviation of the residuals. Residuals is the measure od how far the data points are spreaded across the line of regression which we get by our training data set. RMSE is the measure of how spread out these residuals are. In other words, it tells you how concentrated the data is around the line of best fit.

Mean Squared Error:-
The Mean Squared Error (MSE) is a measure of how close a fitted line is to data points. It is the sum, over all the data points, of the square of the difference between the predicted and actual target variables, divided by the number of data points. RMSE is the square root of MSE.



#### Calculated Value for RMSE AND MSE  [Reference](#1)

In [None]:
model = LinearRegression()
model.fit(train_X, train_Y)
train_pred_Y = model.predict(train_X)
test_pred_Y = model.predict(test_X)
train_pred_Y = pd.Series(train_pred_Y.clip(0, train_pred_Y.max()), index=train_Y.index)
test_pred_Y = pd.Series(test_pred_Y.clip(0, test_pred_Y.max()), index=test_Y.index)

rmse_train = np.sqrt(mean_squared_error(train_pred_Y, train_Y))
msle_train = mean_squared_log_error(train_pred_Y, train_Y)
rmse_test = np.sqrt(mean_squared_error(test_pred_Y, test_Y))
msle_test = mean_squared_log_error(test_pred_Y, test_Y)

#q=model.score(rmse_test,rmse_train)

print('rmse_train: %.2f '% (rmse_train),'msle_train: %.2f ' %(msle_train))
print('rmse_test: %.2f ' %(rmse_test),'msle_test:%.2f ' %(msle_test))

### Simple Linear Regression

The case of single explanatory variable (independent variable) is called single linear regression.¶    [1](#1)

#### Linear Regression using a positively correlated factor(highest)

In [None]:
train, test = train_test_split(df, test_size=0.3, shuffle=True)
training_features = ['Phones (per 1000)']
target = 'GDP ($ per capita)'
train_X = train[training_features]
train_Y = train[target]
test_X = test[training_features]
test_Y = test[target]


In [None]:
model = LinearRegression()
model.fit(train_X, train_Y)
train_pred_Y = model.predict(train_X)
test_pred_Y = model.predict(test_X)
train_pred_Y = pd.Series(train_pred_Y.clip(0, train_pred_Y.max()), index=train_Y.index)
test_pred_Y = pd.Series(test_pred_Y.clip(0, test_pred_Y.max()), index=test_Y.index)

rmse_train = np.sqrt(mean_squared_error(train_pred_Y, train_Y))
msle_train = mean_squared_log_error(train_pred_Y, train_Y)
rmse_test = np.sqrt(mean_squared_error(test_pred_Y, test_Y))
msle_test = mean_squared_log_error(test_pred_Y, test_Y)

print('rmse_train:%.2f '%(rmse_train),'msle_train:%.2f '%(msle_train))
print('rmse_test:%.2f '% (rmse_test),'msle_test:%.2f '%(msle_test))

plt.scatter(test_X, test_Y, color = 'red')
plt.plot(train_X, train_pred_Y, color = 'blue')
plt.xlabel('Phones per 1000')
plt.ylabel('GDP per capita')
plt.title('Linear Regression between Phones per 1000 and GDP per capita')
plt.show()


### Calculating the total GDP and Plotting top 10 countries with highest Total GDP

In [None]:
df['Total_GDP ($)'] = df['GDP ($ per capita)'] * df['Population']
top_gdp_countries = df.sort_values('Total_GDP ($)',ascending=False).head(10)
other = pd.DataFrame({'Country':['Other'], 'Total_GDP ($)':[df['Total_GDP ($)'].sum() - top_gdp_countries['Total_GDP ($)'].sum()]})
gdps = pd.concat([top_gdp_countries[['Country','Total_GDP ($)']],other],ignore_index=True)

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20,7),gridspec_kw = {'width_ratios':[2,1]})
sns.barplot(x='Country',y='Total_GDP ($)',data=gdps,ax=axes[0],palette='Set2')
axes[0].set_xlabel('Country',labelpad=30,fontsize=16)
axes[0].set_ylabel('Total_GDP',labelpad=30,fontsize=16)

colors = sns.color_palette("Set2", gdps.shape[0]).as_hex()
axes[1].pie(gdps['Total_GDP ($)'], labels=gdps['Country'],colors=colors,autopct='%1.1f%%',shadow=True)
axes[1].axis('equal')
plt.show()

Table Overview:- Rank of countries on basis of total GDP and Rank of countries on basis of GDP per capita

In [None]:
Rank_total_gdp = df[['Country','Total_GDP ($)']].sort_values('Total_GDP ($)', ascending=False).reset_index()
Rank_gdp = df[['Country','GDP ($ per capita)']].sort_values('GDP ($ per capita)', ascending=False).reset_index()
Rank_total_gdp= pd.Series(Rank_total_gdp.index.values+1, index=Rank_total_gdp.Country)
Rank_gdp = pd.Series(Rank_gdp.index.values+1, index=Rank_gdp.Country)
Rank_change = (Rank_gdp-Rank_total_gdp).sort_values(ascending=False)
print('rank of total GDP - rank of GDP per capita:')
Rank_change.loc[top_gdp_countries.Country]

### Correlation between Total GDP and factors

In [None]:
plt.figure(figsize=(16,12))
ax=plt.axes()
y=df[df.columns[2:]].apply(lambda x: x.corr(df['Total_GDP ($)']))
print(y)
sns.heatmap(data=df.iloc[:,2:].corr(),annot=True,fmt='.2f',cmap='coolwarm',ax=ax)
ax.set_title('Heatmap showing correlated values for the Dataset with respect to total ')
plt.show()

# Conclusions
1. Given Dataset is rightly skewed and hence therefore it's measure of central tendency is median.
2. GDP per capita is highly correlated with phones, services ,literacy rate(positively correlated) and infant mortality rate, agriculture ,birthrate (negatively correlated).
3. On being grouped region wise, GDP per capita is positively correlated with phones and services. As in the region where people tend to buy more phones those regions tend to have more GDP per capita and as for services , more the services more is the GDP per capita.
4. For highly correlated factors , the density distribution is mostly skewed. 
5. Climate has no effect on GDP per capita.
6. For multiple linear regression, we found the RMSE and MSLE values for test data, and RMSE value is low in the range (55000) is a good measure and hence tells us that model is a good predictor as in we can make theoritical claims and also the vakue for MSLE is low so our model is also a good estimator.(lower value of MSE shows that whether our model is a good estimator. (As in test data fits well with line of regression or not).
7. For single linear regression (phones per 1000 vs GDP per capita), we see that both RMSE and MSE values are good measure. Phones per 1000 is a good factor that can predict GDP per capita values.
8. According to total GDP countries like India and China which have low GDP per capita (Rank 146 and Rank 118 respectively) jump to positions 4 and 2 respectively. This shows that although GDP per capita per country is low they have high purchasing power(total GDP).
9. Countries with high total GDP is quite different from countries with high GDP per capita.
10. Total GDP is highly correlated with Area and Population.
11. Factors which were highly correlated with GDP per capita has almost no effect on total GDP except for Phones per 1000 , which has correlation of 0.23.


# Contributions

The Dataset had many null values , so cleaning of data was done by me . Also there were few plots which were not presented well, I had aligned those with respect to my code (Barplot & Distplot). Then I did linear regression on the given dataset and calculated the goodness of the model.Also , additionally I calculated the value of total GDP per country and compared those with GDP per capita and found interesting results. Also , i did simple correlation for total GDP with factors.All in all most part of the code was given to us but I contributed around 30% in terms of coding for the given assignment.


# Citations

1. https://www.kaggle.com/stieranka/predicting-gdp-world-countries/notebook
<a id='1'></a>
2. https://www.youtube.com/watch?v=E5RjzSK0fvY&feature=youtu.be
<a id='2'></a>
3. https://github.com/nikbearbrown?tab=repositories
4. https://scikitlearn.org/stable/auto_examples/linear_model/plot_ols.html
5. https://docs.python.org/3/library/

# License

Copyright (c) 2019 Manali Sharma

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.