### Dependencies for the Project

In [1]:
import pandas as pd


### Importing the CSV file for Behavior and Attitudes

In [86]:
file = "./Behavior_and_Attitudes.csv"
file2 = "./policies.csv"
df_BA = pd.read_csv(file)
df_p = pd.read_csv(file2)
df_p.head()

Unnamed: 0,code,economy,year,Financing for entrepreneurs,Governmental support and policies,Taxes and bureaucracy,Governmental programs,Basic school entrepreneurial education and training,Post school entrepreneurial education and training,R&D transfer,Commercial and professional infrastructure,Internal market dynamics,Internal market openness,Physical and services infrastructure,Cultural and social norms
0,374,Armenia,2019,2.48,2.69,3.19,2.42,2.06,2.37,2.16,3.35,3.12,2.76,3.91,3.55
1,61,Australia,2019,3.1,2.62,2.7,2.82,2.43,2.81,2.52,3.09,2.7,2.88,3.53,3.15
2,375,Belarus,2019,2.21,2.28,2.73,2.12,2.0,2.86,2.3,3.1,3.28,2.67,4.08,2.39
3,55,Brazil,2019,2.93,2.5,1.87,2.56,1.78,2.67,2.25,2.82,3.44,2.49,3.23,2.47
4,359,Bulgaria,2019,2.75,1.89,2.8,2.07,1.95,2.53,2.17,3.03,3.1,2.68,4.09,2.5


### Exploring the Database

#### No of unique countries in the Survey

In [23]:
print(f"No of unique countries in the survey : {len(df_BA['economy'].unique())}")

No of unique countries in the survey : 113


#### Understanding the Number of economies every year. 

We could see that not every country is been surveyed in all the years. 2001 was the year with minimum countries(28) in the survey and 2013 and 2014 had 70 countries participating. The latest year 2019 have 50 economies surveyed.

In [25]:
df_BA["year"].value_counts()

2014    70
2013    70
2012    67
2016    64
2015    60
2010    59
2011    55
2009    54
2017    54
2019    50
2018    49
2008    43
2007    42
2006    42
2002    37
2005    35
2004    34
2003    31
2001    28
Name: year, dtype: int64

#### Null Values

The dataset have null values in certain column and that has been identified below. The columns with null values are,
1. Fear of failure rate * 
2. Entrepreneurial intentions
3. Established Business Ownership 
4. Entrepreneurial Employee Activity
5. Motivational Index 
6. Female/Male Opportunity-Driven TEA 
7. High Job Creation Expectation
8. Innovation
9. Business Services Sector
10. High Status to Successful Entrepreneurs
11. Entrepreneurship as a Good Career Choice 

In [26]:
# identifying missing values
df_BA.count()

code                                                944
economy                                             944
year                                                944
Perceived opportunities                             944
Perceived capabilities                              944
Fear of failure rate *                              943
Entrepreneurial intentions                          916
Total early-stage Entrepreneurial Activity (TEA)    944
Established Business Ownership                      943
Entrepreneurial Employee Activity                   458
Motivational Index                                  548
Female/Male TEA                                     944
Female/Male Opportunity-Driven TEA                  367
High Job Creation Expectation                       941
Innovation                                          489
Business Services Sector                            906
High Status to Successful Entrepreneurs             837
Entrepreneurship as a Good Career Choice        

### Fear of failure rate--dealing with null value

In [27]:
# Fear of failure rate has just one null value. identifying the row or economy with null value

df_BA.loc[df_BA["Fear of failure rate *"].isna()]

Unnamed: 0,code,economy,year,Perceived opportunities,Perceived capabilities,Fear of failure rate *,Entrepreneurial intentions,Total early-stage Entrepreneurial Activity (TEA),Established Business Ownership,Entrepreneurial Employee Activity,Motivational Index,Female/Male TEA,Female/Male Opportunity-Driven TEA,High Job Creation Expectation,Innovation,Business Services Sector,High Status to Successful Entrepreneurs,Entrepreneurship as a Good Career Choice
736,582,Venezuela,2007,56.04,66.24,,20.68,20.16,5.39,,,0.72,,25.62,,6.72,71.7,76.57


In [28]:
# pulling all the data point related to Venezuela
df_BA.loc[df_BA["economy"]=="Venezuela"]

Unnamed: 0,code,economy,year,Perceived opportunities,Perceived capabilities,Fear of failure rate *,Entrepreneurial intentions,Total early-stage Entrepreneurial Activity (TEA),Established Business Ownership,Entrepreneurial Employee Activity,Motivational Index,Female/Male TEA,Female/Male Opportunity-Driven TEA,High Job Creation Expectation,Innovation,Business Services Sector,High Status to Successful Entrepreneurs,Entrepreneurship as a Good Career Choice
538,582,Venezuela,2011,48.45,66.86,24.15,20.23,15.43,1.57,0.63,1.52,0.88,,,13.44,6.49,77.26,83.06
650,582,Venezuela,2009,48.21,59.31,25.61,28.7,18.66,6.51,,,0.91,,23.78,,,68.88,76.24
736,582,Venezuela,2007,56.04,66.24,,20.68,20.16,5.39,,,0.72,,25.62,,6.72,71.7,76.57
813,582,Venezuela,2005,65.15,74.94,30.95,40.0,24.95,8.59,,,0.91,,30.0,,8.67,77.0,84.32
878,582,Venezuela,2003,43.04,82.14,30.73,37.28,26.81,9.63,,,0.88,,28.36,,10.08,73.01,79.67


#### Treating the one null value in Fear of Failure rate
Since there are five data points, the one null value can be filled by calculating the mean from four other fear of 
failure rate data.

In [51]:
#calculating mean failure rate

mean_ffrate=df_BA.loc[(df_BA["economy"]=="Venezuela") & (df_BA["year"]!=2007),:]["Fear of failure rate *"].mean()

print(f"The data is updated with the mean value {mean_ffrate}")

# adding it to the df

df_BA["Fear of failure rate *"]=df_BA["Fear of failure rate *"].fillna(mean_ffrate)

#Displaying the DF with the changes made

df_BA.loc[df_BA["economy"]=="Venezuela"]

The data is updated with the mean value 27.86


Unnamed: 0,code,economy,year,Perceived opportunities,Perceived capabilities,Fear of failure rate *,Entrepreneurial intentions,Total early-stage Entrepreneurial Activity (TEA),Established Business Ownership,Entrepreneurial Employee Activity,Motivational Index,Female/Male TEA,Female/Male Opportunity-Driven TEA,High Job Creation Expectation,Innovation,Business Services Sector,High Status to Successful Entrepreneurs,Entrepreneurship as a Good Career Choice
538,582,Venezuela,2011,48.45,66.86,24.15,20.23,15.43,1.57,0.63,1.52,0.88,,,13.44,6.49,77.26,83.06
650,582,Venezuela,2009,48.21,59.31,25.61,28.7,18.66,6.51,,,0.91,,23.78,,,68.88,76.24
736,582,Venezuela,2007,56.04,66.24,27.86,20.68,20.16,5.39,,,0.72,,25.62,,6.72,71.7,76.57
813,582,Venezuela,2005,65.15,74.94,30.95,40.0,24.95,8.59,,,0.91,,30.0,,8.67,77.0,84.32
878,582,Venezuela,2003,43.04,82.14,30.73,37.28,26.81,9.63,,,0.88,,28.36,,10.08,73.01,79.67


### Entrepreneural intentions--dealing with null value

All the economies that was surveyed in 2001 might not have questions on entrepreneurial intentions and hence the datapoint is null for all 28 economies.


In [55]:
#identifying the null values

df_BA.loc[df_BA["Entrepreneurial intentions"].isna()]

Unnamed: 0,code,economy,year,Perceived opportunities,Perceived capabilities,Fear of failure rate *,Entrepreneurial intentions,Total early-stage Entrepreneurial Activity (TEA),Established Business Ownership,Entrepreneurial Employee Activity,Motivational Index,Female/Male TEA,Female/Male Opportunity-Driven TEA,High Job Creation Expectation,Innovation,Business Services Sector,High Status to Successful Entrepreneurs,Entrepreneurship as a Good Career Choice
916,54,Argentina,2001,19.83,54.8,37.17,,9.92,3.92,,,0.38,,17.81,,16.72,,
917,61,Australia,2001,31.27,59.66,38.18,,14.68,27.96,,,0.48,,28.79,,30.32,,
918,32,Belgium,2001,19.96,30.32,37.87,,4.19,3.02,,,0.51,,13.54,,18.85,,
919,55,Brazil,2001,40.6,54.28,32.14,,13.8,3.79,,,0.62,,14.44,,9.7,,
920,101,Canada,2001,34.54,53.03,26.89,,10.27,3.89,,,0.63,,12.37,,27.06,,
921,45,Denmark,2001,45.78,40.81,23.72,,7.23,4.25,,,0.43,,3.48,,43.89,,
922,358,Finland,2001,54.91,37.61,35.17,,8.16,7.46,,,0.54,,19.09,,35.07,,
923,33,France,2001,6.89,20.01,28.71,,5.72,1.62,,,0.43,,8.37,,24.89,,
924,49,Germany,2001,23.65,30.11,41.75,,6.28,4.18,,,0.44,,17.44,,32.0,,
925,36,Hungary,2001,8.63,55.3,10.44,,10.86,5.9,,,0.57,,15.79,,20.25,,


### Established Business Ownership- null values

Replaced the single value with the closest data point.

In [72]:
df_BA.loc[df_BA['Established Business Ownership'].isna()]

Unnamed: 0,code,economy,year,Perceived opportunities,Perceived capabilities,Fear of failure rate *,Entrepreneurial intentions,Total early-stage Entrepreneurial Activity (TEA),Established Business Ownership,Entrepreneurial Employee Activity,Motivational Index,Female/Male TEA,Female/Male Opportunity-Driven TEA,High Job Creation Expectation,Innovation,Business Services Sector,High Status to Successful Entrepreneurs,Entrepreneurship as a Good Career Choice
928,972,Israel,2001,17.98,35.64,33.19,,5.29,,,,0.28,,47.47,,52.21,,


In [73]:
df_BA.loc[df_BA['economy']=='Israel']

Unnamed: 0,code,economy,year,Perceived opportunities,Perceived capabilities,Fear of failure rate *,Entrepreneurial intentions,Total early-stage Entrepreneurial Activity (TEA),Established Business Ownership,Entrepreneurial Employee Activity,Motivational Index,Female/Male TEA,Female/Male Opportunity-Driven TEA,High Job Creation Expectation,Innovation,Business Services Sector,High Status to Successful Entrepreneurs,Entrepreneurship as a Good Career Choice
18,972,Israel,2019,46.0,43.34,55.36,21.2,12.69,5.45,5.75,,0.69,,21.73,,27.06,84.13,64.21
70,972,Israel,2018,56.23,41.48,47.47,26.2,12.7,4.2,7.21,3.3,0.7,1.1,22.9,32.9,7.3,84.98,65.96
121,972,Israel,2017,58.29,44.14,47.96,26.42,12.78,3.32,8.55,2.02,0.72,1.0,8.66,26.7,27.3,86.07,65.16
183,972,Israel,2016,53.69,41.1,48.65,20.61,11.31,4.0,7.3,2.6,0.71,1.15,22.1,30.4,37.37,85.5,64.2
243,972,Israel,2015,55.5,41.56,47.76,21.59,11.82,3.9,6.55,3.29,0.65,1.02,23.6,30.78,32.9,86.24,64.48
374,972,Israel,2013,46.5,36.17,51.76,23.97,10.04,5.94,,2.83,0.48,1.03,23.84,34.13,31.91,80.3,60.61
445,972,Israel,2012,30.62,29.31,46.76,12.81,6.53,3.78,4.24,2.41,0.72,,21.33,29.15,24.05,72.39,59.47
564,972,Israel,2010,33.88,39.94,46.71,13.45,5.02,3.25,,2.24,0.5,,32.1,,26.9,73.23,60.12
619,972,Israel,2009,28.99,38.27,37.27,13.62,6.07,4.27,,,0.52,,32.43,,,73.24,61.4
674,972,Israel,2008,24.78,37.83,44.81,14.19,6.36,4.12,,,0.46,,23.77,,33.65,74.13,56.23


In [75]:
#Replacing with the closest value.
df_BA["Established Business Ownership"]=df_BA["Established Business Ownership"].fillna(5.66)
df_BA.loc[df_BA['economy']=='Israel']

Unnamed: 0,code,economy,year,Perceived opportunities,Perceived capabilities,Fear of failure rate *,Entrepreneurial intentions,Total early-stage Entrepreneurial Activity (TEA),Established Business Ownership,Entrepreneurial Employee Activity,Motivational Index,Female/Male TEA,Female/Male Opportunity-Driven TEA,High Job Creation Expectation,Innovation,Business Services Sector,High Status to Successful Entrepreneurs,Entrepreneurship as a Good Career Choice
18,972,Israel,2019,46.0,43.34,55.36,21.2,12.69,5.45,5.75,,0.69,,21.73,,27.06,84.13,64.21
70,972,Israel,2018,56.23,41.48,47.47,26.2,12.7,4.2,7.21,3.3,0.7,1.1,22.9,32.9,7.3,84.98,65.96
121,972,Israel,2017,58.29,44.14,47.96,26.42,12.78,3.32,8.55,2.02,0.72,1.0,8.66,26.7,27.3,86.07,65.16
183,972,Israel,2016,53.69,41.1,48.65,20.61,11.31,4.0,7.3,2.6,0.71,1.15,22.1,30.4,37.37,85.5,64.2
243,972,Israel,2015,55.5,41.56,47.76,21.59,11.82,3.9,6.55,3.29,0.65,1.02,23.6,30.78,32.9,86.24,64.48
374,972,Israel,2013,46.5,36.17,51.76,23.97,10.04,5.94,,2.83,0.48,1.03,23.84,34.13,31.91,80.3,60.61
445,972,Israel,2012,30.62,29.31,46.76,12.81,6.53,3.78,4.24,2.41,0.72,,21.33,29.15,24.05,72.39,59.47
564,972,Israel,2010,33.88,39.94,46.71,13.45,5.02,3.25,,2.24,0.5,,32.1,,26.9,73.23,60.12
619,972,Israel,2009,28.99,38.27,37.27,13.62,6.07,4.27,,,0.52,,32.43,,,73.24,61.4
674,972,Israel,2008,24.78,37.83,44.81,14.19,6.36,4.12,,,0.46,,23.77,,33.65,74.13,56.23


### Entrepreneurial employee activity, Motivational Index , Female/Male Opportunity-Driven TEA ,Innovation,High Status to Successful Entrepreneurs, Entrepreneurship as a Good Career Choice, Business Services Sector, High Job Creation Expectation --missing values
These columns have more than 100 missing values and will be only used for plotting purposes.

In [84]:
print(f"Missing values in Entrepreneurial Employee Activity is :{len(df_BA.loc[df_BA['Entrepreneurial Employee Activity'].isna()])}")
print(f"Missing values in Motivational Index is :{len(df_BA.loc[df_BA['Motivational Index'].isna()])}")
print(f"Missing values in Female/Male Opportunity-Driven TEA is :{len(df_BA.loc[df_BA['Female/Male Opportunity-Driven TEA'].isna()])}")
print(f"Missing values in Innovation is :{len(df_BA.loc[df_BA['Innovation'].isna()])}")
print(f"Missing values in High Status to Successful Entrepreneurs is :{len(df_BA.loc[df_BA['High Status to Successful Entrepreneurs'].isna()])}")
print(f"Missing values in Entrepreneurship as a Good Career Choice is :{len(df_BA.loc[df_BA['Entrepreneurship as a Good Career Choice'].isna()])}")
print(f"Missing values in  Business Services Sector is :{len(df_BA.loc[df_BA['Business Services Sector'].isna()])}")
print(f"Missing values in High Job Creation Expectation is :{len(df_BA.loc[df_BA['High Job Creation Expectation'].isna()])}")


Missing values in Entrepreneurial Employee Activity is :486
Missing values in Motivational Index is :396
Missing values in Female/Male Opportunity-Driven TEA is :577
Missing values in Innovation is :455
Missing values in High Status to Successful Entrepreneurs is :107
Missing values in Entrepreneurship as a Good Career Choice is :110
Missing values in  Business Services Sector is :38
Missing values in High Job Creation Expectation is :3
