## Programming for Data Analysis - Project

### Problem statement

For this project you must create a data set by simulating a real-world phenomenon of
your choosing. You may pick any phenomenon you wish – you might pick one that is
of interest to you in your personal or professional life. 

Then, rather than collect data
related to the phenomenon, you should model and synthesise such data using Python.


We suggest you use the numpy.random package for this purpose.

Specifically, in this project you should:

* Choose a real-world phenomenon that can be measured and for which you could collect at least one-hundred data points across at least four different variables.

* Investigate the types of variables involved, their likely distributions, and their relationships with each other.

* Synthesise/simulate a data set as closely matching their properties as possible.

* Detail your research and implement the simulation in a Jupyter notebook – the data set itself can simply be displayed in an output cell within the notebook.

#### Note:
this project is about simulation – you must synthesise a data set. Some
students may already have some real-world data sets in their own files. It is okay to
base your synthesised data set on these should you wish (please reference it if you do),
but the main task in this project is to create a synthesised data set. The next section
gives an example project idea.

### Example project idea

As a lecturer I might pick the real-world phenomenon of the performance of students
studying a ten-credit module. After some research, I decide that the most interesting
variable related to this is the mark a student receives in the module - this is going to be
one of my variables (grade).

Upon investigation of the problem, I find that the number of hours on average a
student studies per week (hours), the number of times they log onto Moodle in the
first three weeks of term (logins), and their previous level of degree qualification (qual)
are closely related to grade. 

The hours and grade variables will be non-negative real number with two decimal places, logins will be a non-zero integer and qual will be a categorical variable with four possible values: none, bachelors, masters, or phd.

After some online research, I find that full-time post-graduate students study on average four hours per week with a standard deviation of a quarter of an hour and that a normal distribution is an acceptable model of such a variable. Likewise, I investigate the other four variables, and I also look at the relationships between the variables. 

I devise an algorithm (or method) to generate such a data set, simulating values of the
four variables for two-hundred students. I detail all this work in my notebook, and then
I add some code in to generate a data set with those properties.

#### Reference: Malawi Evidence of Tobacco Companies Affecting Restrictions: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2741530/


#### Reference: Philip Morris smoking advertising in India (Reuters): https://www.reuters.com/investigates/special-report/pmi-india/

#### Reference: Number of tobacco smokers worldwide from 2000 to 2025, by country income: https://www.statista.com/statistics/937428/tobacco-smoking-numbers-globally-country-income/#__sid=js4

#### Reference: ASH Fact sheet: Tobacco and the Developing World (ASH): https://ash.org.uk/wp-content/uploads/2019/07/ASH-Factsheet_Developing-World_v3.pdf

#### Reference (The Conversation): https://theconversation.com/big-tobacco-goes-after-the-young-in-developing-markets-in-a-case-of-history-repeated-82043

#### Reference: Cigarette consumption per year, 1970-2015: https://www.bmj.com/content/bmj/365/bmj.l2231.full.pdf

#### Reference: tobacco industry Indonesia: https://www.statista.com/topics/5728/tobacco-industry-in-indonesia/

#### Reference: https://tobacco.publichealth.gsu.edu/resources/data/

#### Reference: Improving the implementation of tobacco control policies in low-and middle-income countries: a proposed framework: https://gh.bmj.com/content/4/6/e002078

#### Reference: cigarette labels: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4276461/

#### WHO report 2019, tobacoo controls:https://www.who.int/publications/i/item/WHO-NMH-PND-2019.5   file:///C:/Users/HP/Downloads/WHO-NMH-PND-2019.5-eng.pdf

#### links for tobacco use: https://tobacco.publichealth.gsu.edu/resources/data/

#### Who info on India tobacco controls: https://www.who.int/tobacco/about/partners/bloomberg/ind/en/#:~:text=Several%20provisions%20of%20the%20law,is%20also%20restricted%20in%20India.

#### Percentage of deaths from smoking as a share overall: Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2017 (GBD 2017) Results. Seattle, United States: Institute for Health Metrics and Evaluation (IHME), 2018.: http://ghdx.healthdata.org/gbd-results-tool

#### India smoking controls and effect on death rates: https://www.researchgate.net/publication/333906455_Recent_trends_of_tobacco_use_in_India

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4154048/

#### Swedend smoking and snus: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4154048/

#### solution for categorical variables dependency: https://stackoverflow.com/questions/32633977/how-to-create-categorical-variable-based-on-a-numerical-variable

In [1]:
import pandas as pd

### The Variables

1. **Country:** Categrical variable 

> I will chose 5 Developed countries where campaigns have been introduced to combat deaths from smoking. I will also      choose 5 developing nations where little or no effort has been made to combat smoking.


2. **Years:** Categorical variable

> I will be choosing a 30 year period, from 1980 to 2000, with one observation for each country per year. 


3. **Percentage of Adults Who Smoke:** Numerical variable 

> This datapoint should represent the percentage of adults that smoke. It will closely resemble the distribution of real data on this subject I discovered on *Our World in Data's* smoking series. 


4. **Rating of Government anti-smoking campaigns:** Numerical variable

> This figure should be a rating that reflects standardised rating variables commonly utilised to represent subjects that contain many variable elements. This hypothetical variable, will be a floating point number between 1 and 5 and will be said to be a rating of the various efforts of government in restricting tobacco sale into a single figure. This variable will be the most difficult to simulate in a realistic way. I will achieve this by basing it on a similar rating used in a dataset on the number of deaths associated with opiod use per country.

5. **Influence of Tobacco Lobbying:** Categorical variable
 
> This variable should represent the influence that Big Tobacco and lobbying groups have in the countries and an overall reflection of their efforts to curtail government imposed tobacco control. The variable will contain 4 different categories stored as the following strings: "Weak", "Moderate", "Significant", "Strong". 



## Basis for Variables

### Variable 1: Country

#### Comparison of Trends in Smoking Consumption

The countries included in this simulation will fall into two categories: Low-income and high-income nations. For each category, 3 countries have been chosen. A seperate dataset will be created for each category of country to allow for effective comparison of trends. All the countries included have been chosen as they exhibit varying degrees of smoking consumption, tobacco control and lobbying from the tobacco industry. It will be shown that lower income countries exhibit a trend towards increased or stable tobacco related deaths.

The 6 countries included in this project are presented in the table below:

| Low-Income Countries| High-income Countries         |
|:---:| :---:|
| Indonesia |   Sweden       |
|    Myanmar    |   France       |
|  India      |   Austria       |

Section ___ below investigates the smoking trends in 1.Indonesia, 2.India and 3.Sweden.

#### Indonesia

The two graphs below display the percentage of smokers in South East Asia and the Indian sub-continent aged 15 and over who smoked in 1. 2007 and 2. 2018 (World Health Organization, Global Health Observatory Data Repository)

**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2007: % Share of Adult Smoker (15 Years +)** 
<img src="Smoking_se_asia_2007.PNG
" alt="Drawing" style="width: 650px;"/>

**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2018: % Share of Adult Smoker (15 Years +)**
<img src="Smoking_se_asia_2018.PNG
" alt="Drawing" style="width: 650px;"/>

Firstly, we will look at Indonesia, a low-income countries where tobacco lobby influence is said to be strong and where there has prevailed a high level of tobacco consumption for the past 30 years, for a myriad of reasons. Recent studies on smoking in Indonesia point to factors such as a high degree of reliance on the tobacco crop, in conjunction with strong influence from Big Tobacco impeding restrictions as a means to explain the resiliance of the smoking industry. The alarming nature of the situation was clear back in 1999 when Catherine Reynolds noted specific details of the exponential increase in smoking: 

> * Male participation estimates range from 50% to 85%
> * Since 1970–72, per adult consumption of cigarettes (all forms) has more than doubled, from 500 to 1180 per adult
> * By 1985, a Jakarta study reported that 49% of boys and 9% of girls aged 10–14 were daily smokers
> * By 1995, a health department survey estimated that 22.9% of urban 10 year olds, and 24.8% of rural 10 year olds smoke.
 
This appears to have been viewed with enthusiasm among some Indonesian members of government, as Reynold's notes that an government report issued in 1991 stated: “Prospects for further market growth are considered good. Consumption levels per head of population are low by international standards. . . . A high proportion of Indonesia’s population is in the younger age groups, meaning that the potential population of smokers will be growing rapidly in the next decade at least.” 

This bears out in the data presented by Our World in Data on smoking prevalence as a percentage of adults over 15 years of age, that shows a rise for Indonesia from 32.9% in 2000 to 39.4% in 2016 (Global Health Observatory Data Repository, 2016).

Keeping this in mind, Indonesia will be utilised as an example of a country where Big Tobacco influence is strong and smoking controls are weak. Myanmar displays a similar trend due to the same issues as Indonesia. As a result, the distributions of random data that will be created for these countries will be from the exponential distribution.

#### India

In contrast, India is a country that has successfully introduced smoking controls in recent years. In 2003, India overcame challenges in the courts to prevent the introduction of smoke-free public places, restrictions on tobacco advertising and promotion, amongst other measures. Following this, the country joined the WHO Framework Convention on Tobacco Control. Throughout the 2010's, the government incrementally brought in measures such as warning labels on tobacco products, higher tax on products and an increased power for police to sanction those who break advertising laws (WHO, 2015).

Despite this, the percentage of deaths attributed to smoking, as a share of overall deaths in the country, rose from 7.76% in 1990 to 9.03% in 2017 (Global Burden of Disease Collaborative Network, 2017). This underpins the reality that control measures have a limit to their potential and are limited in terms of how long it takes for measures to begin to show results. Chhabra et al.(2019) outline that smoking among middle aged adults remains high and among the less educated people and those lvining in rural areas. However, it is noted that the number of young and educated adult smokers has dropped significantly (Chabbra et al., 2019). This can account for the graph of current tobacco use above, whereby, India drops from having 38.5% of the population in 2007 to 27% in 2018. 

With this understanding of the demographics, we can assume that the distribution of datapoints for India should differ from that of Indonesia. As this project will attempt to recreate realistic distributions using random data, in terms of deaths it is evident that India still suffers from a high death. Therefore, the datapoints of the Death Rates variable should be created in the Uniform Distribution, with some permutations added. For the variables of government controls and tobacco companies profits, however, the random data produced should be separated into two categories: pre 2003 and post 2003. The introduction of the FCTC catalysed the introduction of control measures. The data before should resemble that of Indonesia and Myanmar. But following 2003, it should represent an comparative improvement.

### Variable 2: Years

The chosen range of years to be present in the dataset is from 1980-2020. This thirty year period has been chosen as it provides an adequate window of time both before and after the introduction of the FCTC in 2003 by the WHO. As this framework was instrumental in the measure brought in across Europe and in India and thereby had a measurable effect on the levels of smoking, it is appropriate to study both the 13 year period before it's conception and the 17 years since.

Additionally, the period 1980 - 2000 appears to have been a time of exponential growth in tobacco consumption in Indonesia and Myanmar. It is important to highlight the degree of this growth across all variables appropriately, and how the FCTC has not affected the death rate in these two tobacco reliant nations.

----------------------

### Variable 3: Smokers (% of adults)

#### Justification for Variable

There were various variables that were considered as a key to highlighting the relationship between an increase in smoking and tobacco lobbying against government controls. For example, the number of deaths from smoking is a very important indictor of the consequences of smoking. Similarly, the per capita daily consumption of cigarettes is another variable that may highlight aspects of smoking trends. 

However, studying the number of annual deaths is complicated by the fact that deaths due to smoking can remain high even where efforts to restrict tobacco sale have been employed. Meanwhile, the comsumption of cigarettes per capita is not a great reflection on an entire population smoking trend, as it includes chain smokers, and the amount of cigarettes a smoker consumes varies from person to person.

In terms of highlighting the relationship with degrees of lobbying/restrictions, the percent of smoking is most appropriate for this simulation. 

#### Low-Income Countries

The graph below details smoking as a percentage of adults (over 15) from 2000 to 2016, in relation to the six countries that this simulation concerns itself with:

**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;%Share of Adult Smokers
(15 Years +)** 
<img src="smoking_%_2016.PNG"/>

*Image from Our World in Data*

##### India

As was noted in the discussion of India, the number of deaths from smoking increased in the same period of time that the percentage of population smoking decreased. This was due to the fact that whilst middle-aged people smoked more and were not particularly influenced by government measures, younger people had increasingly chose not to smoke. As can be seen in the graph above, India actually experienced a drop in almost 10% from 2000 to 2016. This is a significant achievement for a low-income country. In creating random data for smoking prevelance in India, the Gamma distribution will be used to simulate this steep decline.

##### Indonesia

Contrastly, in Indoenesia with little government intervention and a strong degree of influence from the tobacco industry, the country experienced the only increase in smoking prevelance out of the 6 example countries. In 2000, 33% of Indonesians smoked, but by 2016 this figure was at 40%. The comparison between India and Indonesia best exemplifies the relationship between government intervention and decreasing smoking prevelance. 

#### High-Income Countries

##### France

Looking at the above graph it is clear that France exhibits only a very slight annual decrease in the period 2000-2016. As seen below, in the period 2013-2016, a consistent decline of 0.1%, for both daily and non-daily smokers (macrotrends.net):
* Smoking rate for 2016 was 32.70%, a 0.1% decline from 2015
* Smoking rate for 2015 was 32.80%, a 0.1% decline from 2014
* Smoking rate for 2014 was 32.90%, a 0.1% decline from 2013
* Smoking rate for 2013 was 33.00%, a 0.1% decline from 2012

As detailed in Section___, there are various reasons for the near stagnation of smoking prevelance decline in France. As such, the random data simulated for France will be in the Uniform Distribution. This data will then be permuted with Numpy. 

##### Austria

On the contrary, Austria a dramatic decrease from a high almost 50% in 2000, to 30% in 2016. This drop of almost 20% is significant necessitates the creation of random data that reflects a sharp and consistent downwards trend. As such, Numpy's Gamma distribution function will be utilised.

##### Sweden

Sweden exhibited the same dramatic trend as Austria, however it achieved this much earlier. In 1988 33% of men aged 35 to 44 smoked daily, by 2004, this had dropped to 13%, with the older age groups showing a drop of 10% (Ramstrom and Wilkmans,2014). As such the random data for smoking prevelance for Sweden will also be in the Gamma distribution.

-----------------

### Variable 4: Rating of Government Anti-Smoking Campaigns

#### Smoking Deaths and the Relationship with Investment in Anti-smoking Campaigs

In the past two decades there has been a considerable effort by many developed nations to decrease smoking. This effort has taken many forms. Many countries have hiked taxes that consumers pay on cigarette purchases, in order to both deter people from smoking and to cover the cost of the healtcare that long-term smokers require. Additionally, restrictions have been put on the sale of cigarettes, such as getting rid of branding that tobacco companies are allowed to use. This follows on from laws introduced in the 1990's and 2000's making it illegal to advertise cigarettes to consumers. Finally, campaigns have been launched across Europe, America and in many developing nations (though not all) to highlight the high correlation between smoking and various forms of cancer. 

The result is a considerable decrease in smoking related deaths for 100,000 of the populations of these countries where restrictions and campaigns were launched. This success should not be diminished - it highlights that a conscious effort can significantly affect the mindset of peoples and also forms part of a greater awareness worldwide of being more aware of what we put into our bodies.

Despite this success however, there is evidence that in some developing nations, where smoking is more prevelant to begin with and where litte has been done in the way of restricting the operations of tobacco combanies in the marketing and sale of product, that smoking related deaths have not decreased, but rather increased. 

Looking at the sales of the top 5 tobacco companies, or 'Big Tobacco' as they are collectively known as, in some countries these companies have seen an increase in revenues in recent years. Some articles have even highlighted how this is arguably the most profitable time in history for tobacco companies.

Having researched this phenomenon online and comparing it against datasets on deaths from smoking accessed on *Our World in Data's*, I have chosen to investigate this example and synthesise random data into particular distributions. These distributions are based on plausible conclusions drawn from reliable resources. As the function of this project is on the methodology used in creating random data, giving it a realistic shape and investigating this simulation, this project is not intended to be empirical evidence.

--------------------

### Variable 5: Influence of Tobacco Lobbying

#### Concern Over the Tactics of Tobacco Industry in Low-income Countries

In July 2019, a major group responsible for highlighting the consequences of smoking, Ash (or Action on Smoking and Health), opened their report on smoking in developing countries with the following statement:

> "Around 1.1 billion people aged 15 and over smoke, with 80% living in LMICs (low and middle income
countries). Tobacco growing and consumption have become concentrated in the developing world where
the health, economic, and environmental burden is heaviest and likely to increase."

Ash's assertment is backed up by data of a surge in Tobacco company profits, studies carried out throghout Africa and Southeast Asia of tactics from 'Big Tobacco' to curtail government efforts to tackle smoking and numerous verified reports of illegal marketing ploys in low-income countries.

Since 1984, British American Tobacco (BAT), in association with the International Tobacco Growers’ Association (ITGA), an organisation that was founded to lobby for Big Tobacco, have used varius tactics to pressure low-income countries to abandom restrictions on the growing and sale of Tobacco. In Malawi, a country that has a long history of tobacco crop farming, this lead to a government official to push back against the World Health Organisation on the issue of tobacco control, on the basis that it would negatively impact the economy of Malawi. Indeed, today tobacco sale accounts for 70% of Malawi's foreign earnings (Mamudu et al., 2009). This economic dependency and the tact of the tobacco lobbies, lead the  International Tobacco Growers’ Association (ITGA) to platform tobacco growers representatives against the bodies of the United Nations responsible for tobacco control. This tactic formed part of a larger strategy, drawing on representatives from the industry in low-income countries and eventually lead to the weakening of the focus on health in the UN's tobacco control narrative (Mamudu et al, 2009).

In 2017, in an expose on the marketing ploys of Philip Morris in India, Reuters reported that the company responsibe fr the Marlboro brand, breached India's anti-smoking laws. Amongst the many tactics used included the placement of colourful advertisements in kiosks across New Delhi and handing out free cigarettes to young people at parties (Reuters, 2017).

Indonesia is a country where tobacco growing and consumption are notably high and where tobacco companies focus on quelling restrictions. In 2017, it was reported that Indonesia consumed 322 million cigarettes. A survey in 2019 found that curiosity was the primary reason for Indonesians to start smoking, with the average age that a smoker had their first cigarette was 16-18 (Hirschmann, December 2020). 

Overall, there is consensus that tobacco lobbying in low-income countries has had a considerble effect on the introduction of even minor forms of tobacco control. A study in 2016 specifically examined the introduction of health warning labels on cigarette packaging. These labels were compliant with the WHO's Framework Convention on Tobacco Control (FCTC). They concluded that countries where state capacity were low were less likely to introduce the warning labels (Hiilam and Glantz, 2016). This remains as an ever present issue when it comes to tackling cigarette comsumption globally.

In [105]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.pyplot import hist
import seaborn as sns
# Set seed of the generator
rng = np.random.default_rng(0)

## Low-Income Countries Dataset

### Countries and Years Variables

In [106]:
# list containing 61 strings of countrys (beginning with just 3)
temp_df1 = pd.DataFrame({"Countries": ["India", "India", "India", "India", "India", "India", "India", "India", "India", "India", 
            "India", "India", "India", "India", "India", "India", "India", "India", "India", "India", 
            "India",
          "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indonesia", 
          "Indonesia","Indonesia","Indonesia",
          "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indonesia", 
          "Indonesia","Indonesia","Indonesia", "Indonesia",
          "Myanmar", "Myanmar", "Myanmar", "Myanmar", "Myanmar", "Myanmar", "Myanmar", "Myanmar", "Myanmar", "Myanmar", 
            "Myanmar", "Myanmar", "Myanmar", "Myanmar", "Myanmar", "Myanmar", "Myanmar", "Myanmar", "Myanmar", "Myanmar", 
            "Myanmar"],
                    
           "Years": [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
         2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020,
         2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
         2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020,
         2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
         2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020] })

### Prevelance Variable

In [107]:
# Use np.uniform() to create a range of random 60 floats to represent the deaths per 100000
perc1 = rng.uniform(low=21.5, high=52.7, size=(63,))
# Round to 2 decimal places and assign new variable
prevelance1 = perc1.round(1)

### Consumption (Per Capita) Variable

In [108]:
# Generate 60 random integers in the range 500-200 to no. of cigarettes consumed per person (15 years or older)
per_capita1 = rng.integers(low=500, high=2000, size=(63,))

### Government Control Rating (0.0 - 6.0)

In [109]:
# Use np.uniform() to create a range of random 60 floats to represent the rating of restrictions against smoking companies
rating1 = rng.uniform(low=0.5, high= 3.0, size=(63,) )
# Round to 1 decimal place and assign new variable
control_rate1 = rating1.round(1)

### Use the ravel() Method to Combine Columns for 1. Prevelance, 2. Consumption and 3. Control

In [110]:
# Create dataframe of numeric variables using ravel() method
temp_df2 = pd.DataFrame({'Control Rating': control_rate1.ravel(), 'Prevelance (%)': prevelance1.ravel(), 
                     'Consumption': per_capita1.ravel() })

### Degree of Influence of Tobacco Lobbying

The list of categories for the Influence variable has not yet bee

In [135]:
# Create a third dataframe for the Influence variable leaving the square brackets empty - categories will
# populated based off Control Rating variable values
temp_df3 = pd.DataFrame({"Influence": ["Moderate", "Moderate", "Moderate", "Moderate","Moderate",
                       "Moderate", "Moderate", "Moderate", "Moderate","Moderate",
                       "Moderate", "Moderate",
                       "Significant", "Significant", "Significant", "Significant", "Significant", 
                       "Significant", "Significant", 
                       "Strong", "Strong", 
                       
                    "Moderate", "Moderate", "Moderate", "Moderate", "Significant", "Significant", "Significant", 
                    "Significant", "Significant", "Significant", "Significant", "Significant", "Significant", 
                    "Significant", "Significant", "Strong", "Strong", "Strong", "Strong", "Strong", "Strong", 
                       
                    "Moderate", "Moderate", "Moderate", "Moderate", "Significant", "Significant", "Significant", 
                    "Significant", "Significant", "Significant", "Significant", "Significant", "Significant", 
                    "Significant", "Significant", "Strong", "Strong", "Strong", "Strong", "Strong", "Strong"] } ) 

In [136]:
# Concatenate the dataframes along the x-axis, setting the join parameter to 'inner'
dataset1 = pd.concat([temp_df1, temp_df3, temp_df2], axis=1, join='inner')
dataset1.head(10)

Unnamed: 0,Countries,Years,Influence,Control Rating,Prevelance (%),Consumption
0,India,2000,Moderate,2.9,41.4,1434
1,India,2001,Moderate,0.9,29.9,1370
2,India,2002,Moderate,2.9,22.8,1848
3,India,2003,Moderate,2.7,22.0,948
4,India,2004,Moderate,2.6,46.9,1853
5,India,2005,Moderate,1.7,50.0,1507
6,India,2006,Moderate,1.1,40.4,1835
7,India,2007,Moderate,2.5,44.3,799
8,India,2008,Moderate,2.8,38.5,1637
9,India,2009,Moderate,1.2,50.7,1913


#### Altering the 'Influence' Variable

In the first 10 lines of data printed above, the Influence variable has not yet been made dependent on the Control Rating variable. 

In order to do this we use Pandas .loc() method and enter the parameters of 'Control Rating' that the 'Influence' column should interpret. 

For e.g. if control rate is greater than or equal to 1, but less than 2.5, the 'Influence' datapoint for that row should read "Significant". This indicates that countries with a Contrl Rating of 1.0-2.5 exhibit a significant level of influence from the tobacco industry.

In [134]:
# Use the loc() on dataset1 to access the control rating read the value and interpret the value of 
dataset1.loc[(dataset1["Control Rating"] >= 0) & (dataset1["Control Rating"] < 1), "Influence"] = "Strong"
dataset1.loc[(dataset1["Control Rating"] >= 1) & (dataset1["Control Rating"] < 2.5), "Influence"] = "Significant"
dataset1.loc[(dataset1["Control Rating"] >= 2.5) & (dataset1["Control Rating"] < 4.0), "Influence"] = "Moderate"
dataset1.loc[(dataset1["Control Rating"] >= 4.0) & (dataset1["Control Rating"] < 5.0), "Influence"] = "Weak"
dataset1.head(8)

Unnamed: 0,Countries,Years,Influence,Control Rating,Prevelance (%),Consumption
0,India,2000,Moderate,2.9,41.4,1434
1,India,2001,Strong,0.9,29.9,1370
2,India,2002,Moderate,2.9,22.8,1848
3,India,2003,Moderate,2.7,22.0,948
4,India,2004,Moderate,2.6,46.9,1853
5,India,2005,Significant,1.7,50.0,1507
6,India,2006,Significant,1.1,40.4,1835
7,India,2007,Moderate,2.5,44.3,799


Looking at the first 8 values for India, we can see that the Influence variable is determined based off the values from Control Rating.

### Concatenate Dataframes into Dataset for Low-Income Countries

In [130]:
# Concatenate the dataframes along the x-axis, setting the join parameter to 'inner'
dataset1 = pd.concat([temp_df1, temp_df3, temp_df2], axis=1, join='inner')
dataset1

Unnamed: 0,Countries,Years,Influence,Control Rating,Prevelance (%),Consumption
0,India,2000,n,2.9,41.4,1434
1,India,2001,n,0.9,29.9,1370
2,India,2002,n,2.9,22.8,1848
3,India,2003,n,2.7,22.0,948
4,India,2004,n,2.6,46.9,1853
5,India,2005,n,1.7,50.0,1507
6,India,2006,n,1.1,40.4,1835
7,India,2007,n,2.5,44.3,799
8,India,2008,n,2.8,38.5,1637
9,India,2009,n,1.2,50.7,1913


#### Altering the 'Influence' Variable

In the first 20 lines of data printed above, the Influence variable has not yet been made dependent on the Control Rating variable. 

In order to do this we use Pandas .loc() method and enter the parameters of 'Control Rating' that the 'Influence' column should interpret. 

For e.g. if control rate is greater than or equal to 1, but less than 2.5, the 'Influence' datapoint for that row should read "Significant". This indicates that countries with a Contrl Rating of 1.0-2.5 exhibit a significant level of influence from the tobacco industry.

In [77]:
# Use the loc() on dataset1 to access the control rating read the value and interpret the value of 
#dataset1.loc[(dataset1["Control Rating"] >= 0) & (dataset1["Control Rating"] < 1), "Influence"] = "Strong"
#dataset1.loc[(dataset1["Control Rating"] >= 1) & (dataset1["Control Rating"] < 2.5), "Influence"] = "Significant"
#dataset1.loc[(dataset1["Control Rating"] >= 2.5) & (dataset1["Control Rating"] < 4.0), "Influence"] = "Moderate"
#dataset1.loc[(dataset1["Control Rating"] >= 4.0) & (dataset1["Control Rating"] < 5.0), "Influence"] = "Moderate"
#dataset1

Unnamed: 0,Countries,Years,Influence,Control Rating,Prevelance (%),Consumption
0,India,2000,Significant,1.8,41.4,1434
1,India,2001,Significant,1.6,29.9,1370
2,India,2002,Strong,0.6,22.8,1848
3,India,2003,Significant,1.0,22.0,948
4,India,2004,Moderate,2.9,46.9,1853
5,India,2005,Strong,0.9,50.0,1507
6,India,2006,Moderate,2.6,40.4,1835
7,India,2007,Moderate,2.6,44.3,799
8,India,2008,Significant,1.5,38.5,1637
9,India,2009,Significant,1.7,50.7,1913


## High Income Countries Dataset

### Countries and Years Variables

In [21]:
# list containing 61 strings of countrys (beginning with just 3)
temp_df4 = pd.DataFrame({"Countries": ["Austria",  "Austria",  "Austria",  "Austria",  "Austria",  "Austria",  "Austria",  
"Austria",  "Austria", "Austria", "Austria",  "Austria",  "Austria",  "Austria",  "Austria",  "Austria",  "Austria",  
"Austria",  "Austria", "Austria", "Austria",
                                  
"France", "France", "France", "France", "France", "France", "France", "France", "France", "France", "France", "France", 
"France", "France", "France", "France", "France", "France", "France", "France", "France", 
                    
"Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden", 
"Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden"],
                    
        
           "Years": [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
         2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020,
         2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
         2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020,
         2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
         2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020] })

### Prevelance Variable

In [22]:
# Use np.uniform() to create a range of random 60 floats to represent the deaths per 100000
perc2 = rng.uniform(low=5, high=38.5, size=(63,))
# Round to 2 decimal places and assign new variable
prevelance2 = perc2.round(1)

### Consumption (Per Capita) Variable

In [23]:
# Generate 60 random integers in the range 500-200 to no. of cigarettes consumed per person (15 years or older)
per_capita2 = rng.integers(low=100, high=2000, size=(63,))

### Government Control Rating (0.0 - 5.0)

In [24]:
# Use np.uniform() to create a range of random 60 floats to represent the rating of restrictions against smoking companies
rating2 = rng.uniform(low=3.5, high= 4.7, size=(63,) )
# Round to 1 decimal place and assign new variable
control_rate2 = rating2.round(1)

### Use the ravel() Method to Combine Columns for 1. Prevelance, 2. Consumption and 3. Control

In [25]:
# Create dataframe of numeric variables using ravel() method
temp_df5 = pd.DataFrame({'Control Rating': control_rate2.ravel(), 'Prevelance (%)': prevelance2.ravel(), 
                     'Consumption': per_capita2.ravel() })

### Degree of Influence of Tobacco Lobbying

In [26]:
temp_df6 = pd.DataFrame({"Influence": ["Moderate", "Moderate", "Moderate", "Moderate","Moderate",
                       "Moderate", "Moderate", "Moderate", "Moderate","Moderate",
                       "Moderate", "Moderate",
                       "Significant", "Significant", "Significant", "Significant", "Significant", 
                       "Significant", "Significant", 
                       "Strong", "Strong", 
                       
                    "Moderate", "Moderate", "Moderate", "Moderate", "Significant", "Significant", "Significant", 
                    "Significant", "Significant", "Significant", "Significant", "Significant", "Significant", 
                    "Significant", "Significant", "Strong", "Strong", "Strong", "Strong", "Strong", "Strong", 
                       
                    "Moderate", "Moderate", "Moderate", "Moderate", "Significant", "Significant", "Significant", 
                    "Significant", "Significant", "Significant", "Significant", "Significant", "Significant", 
                    "Significant", "Significant", "Strong", "Strong", "Strong", "Strong", "Strong", "Strong"] } ) 

In [36]:
# Concatenate the dataframes along the x-axis
dataset2 = pd.concat([temp_df4, temp_df6, temp_df5], axis=1, join='inner')
dataset2

Unnamed: 0,Countries,Years,Influence,Control Rating,Prevelance (%),Consumption
0,Austria,2000,Moderate,4.7,34.0,1964
1,Austria,2001,Moderate,4.4,7.0,626
2,Austria,2002,Moderate,4.1,17.8,342
3,Austria,2003,Moderate,4.0,19.4,1228
4,Austria,2004,Moderate,4.6,21.4,239
5,Austria,2005,Moderate,3.6,37.7,346
6,Austria,2006,Moderate,4.4,31.0,233
7,Austria,2007,Moderate,4.4,15.3,500
8,Austria,2008,Moderate,4.5,14.0,1750
9,Austria,2009,Moderate,3.9,33.9,1378


In [31]:
pd.set_option('display.max_rows', 200)

frames = [dataset1, dataset2]

v = pd.concat(frames)
v

Unnamed: 0,Countries,Years,Influence,Control Rating,Prevelance (%),Consumption
0,India,2000,Moderate,0.9,41.4,1434
1,India,2001,Moderate,0.6,29.9,1370
2,India,2002,Moderate,0.9,22.8,1848
3,India,2003,Moderate,0.9,22.0,948
4,India,2004,Moderate,0.8,46.9,1853
5,India,2005,Moderate,0.7,50.0,1507
6,India,2006,Moderate,0.6,40.4,1835
7,India,2007,Moderate,0.8,44.3,799
8,India,2008,Moderate,0.9,38.5,1637
9,India,2009,Moderate,0.6,50.7,1913


In [None]:
# Create dataframe for list of strings (countrys) - Split the list to have the first element as a column and the rest as 
# data 
#temp_df7 = list(zip(*[iter(dataset1)]))
temp_df8 = pd.DataFrame(dataset1[1:], columns=dataset2[1])

# Create a variable listing both dataframes together
#dataframes = [temp_df1, temp_df2]

In [None]:
# Use np.uniform() to create a range of random 60 floats to represent the rating of restrictions against smoking companies
rating2 = rng.uniform(low=0.01, high=4.99, size=(63,))
# Round to 1 decimal place1 and assign new variable
control_rate2 = rating2.round(1)
print(control_rate)

In [73]:
x = rng.integers(low=2.5, high=4.8, size=(63,))
y = a*x**(a-1.)


In [75]:
a = 5. # shape
samples = 100
s = np.random.power(6, 63)
s

array([0.93263736, 0.67367287, 0.75999104, 0.85245628, 0.9421442 ,
       0.99271363, 0.74480191, 0.98886243, 0.90445091, 0.95902654,
       0.76381203, 0.81018355, 0.95734795, 0.8559463 , 0.95674811,
       0.95965258, 0.99085017, 0.86183174, 0.90735769, 0.99166627,
       0.81141999, 0.98418397, 0.93579075, 0.74768277, 0.93964497,
       0.83586822, 0.5750916 , 0.75846416, 0.86681792, 0.8783417 ,
       0.95895872, 0.87649011, 0.45892769, 0.96844315, 0.98594206,
       0.88844405, 0.65241467, 0.59282069, 0.7943262 , 0.7955601 ,
       0.7740449 , 0.81373338, 0.55286014, 0.76675695, 0.92199839,
       0.87401611, 0.78084473, 0.99122177, 0.74131548, 0.91717476,
       0.96578138, 0.9017754 , 0.97468701, 0.85907645, 0.91812133,
       0.84821491, 0.73957688, 0.89054105, 0.74069257, 0.84916594,
       0.9610006 , 0.76757621, 0.75756159])

In [76]:
a = 63. # shape
samples = (2.5, 4.4)
s = rng.exponential(scale=0.6, size=63)
s

array([0.82552756, 0.37607861, 0.04120355, 0.07909776, 1.92404114,
       0.13688172, 0.02823741, 0.08614592, 0.26414697, 1.42498149,
       1.13635284, 1.07031822, 0.84672716, 0.08326833, 0.36889453,
       0.68510596, 0.06102769, 1.86076011, 1.13450045, 1.72759474,
       0.81503785, 0.66207993, 1.04627466, 0.04519369, 0.06804576,
       0.50032348, 0.5346236 , 0.13084774, 0.38677881, 0.46788939,
       1.9433741 , 0.10282244, 0.13590107, 0.89917865, 0.6826425 ,
       0.50492212, 0.65613004, 0.14680868, 0.262814  , 0.07769597,
       0.02105276, 2.47520067, 0.01052759, 0.28216626, 0.58905156,
       2.00373011, 0.50037571, 0.24641588, 0.09301193, 0.04016574,
       1.25267637, 0.38386941, 1.44304071, 0.0747324 , 0.16808262,
       0.15685311, 1.44632417, 0.8713005 , 0.35883243, 0.0818599 ,
       3.67222748, 0.07212923, 0.06720004])

## High-Income Countries Dataset

In [35]:


#print(df3)

# Use np.uniform() to create a range of random 60 floats to represent the deaths per 100000
perc = rng.uniform(low=0.5, high=13.3, size=(63,))
# Round to 2 decimal places and assign new variable
prevelance = perc.round(1)



# Generate 60 random integers in the range 45-405 to represent profits (in billions)
profits = rng.integers(low=45, high=405, size=(63,))

In [36]:
rating1 = rng.uniform(low=0.5, high= 0.9, size=(63,) )
rate1 = rating1.round(1)

In [37]:
rating2 = rng.uniform(low=1.5, high= 1.9, size=(63,) )
rate2 = rating2.round(1)

In [38]:
x = rng.integers(low=2.5, high=4.8, size=(63,))
y = a*x**(a-1.)


In [39]:
# Use np.uniform() to create a range of random 60 floats to represent the rating of restrictions against smoking companies
rating = rng.uniform(low=0.01, high=4.99, size=(63,))
# Round to 2 decimal places and assign new variable
control_rate = rating.round(1)

In [40]:
a = 5. # shape
samples = 100
s = np.random.power(6, 63)
s

array([0.69492377, 0.69931311, 0.93073459, 0.86694533, 0.68917497,
       0.84430396, 0.6455113 , 0.99750621, 0.83641107, 0.91437195,
       0.94935707, 0.93159295, 0.93373238, 0.94135264, 0.69521187,
       0.80186601, 0.79590907, 0.64349973, 0.6493439 , 0.84253326,
       0.77190835, 0.93151743, 0.99030696, 0.88227516, 0.48782422,
       0.67478637, 0.91136636, 0.96402173, 0.8387655 , 0.66546291,
       0.88787321, 0.91485294, 0.83624961, 0.98608948, 0.71335155,
       0.99980283, 0.85736768, 0.97882788, 0.86737528, 0.93692223,
       0.83418644, 0.97363037, 0.93141446, 0.74348672, 0.82254794,
       0.9698168 , 0.92461615, 0.97193016, 0.95421702, 0.63928061,
       0.95004521, 0.96057981, 0.87264717, 0.71232539, 0.85187994,
       0.97226447, 0.88054171, 0.77234538, 0.99876702, 0.67873199,
       0.93028898, 0.73101966, 0.96068092])

In [41]:
a = 63. # shape
samples = (2.5, 4.4)
s = rng.exponential(scale=0.6, size=63)
s

array([9.65380400e-01, 4.06919119e-01, 1.16115109e-01, 6.95092469e-01,
       4.76035470e-01, 3.28578832e-01, 3.03246336e-01, 2.61021608e-01,
       6.37693459e-02, 5.10606286e-01, 4.55285019e-01, 6.59405387e-02,
       1.50369678e-01, 1.47285071e-01, 3.52974957e-01, 1.07161101e-03,
       9.20025466e-01, 3.66078153e-01, 2.90415095e-01, 2.92456967e-01,
       8.45538110e-01, 3.69055278e-01, 1.02489225e-02, 5.30514743e-01,
       2.99995284e-02, 2.63503832e-02, 4.40732885e-01, 1.89821508e-01,
       3.09255208e-01, 3.97016370e-01, 5.62780509e-02, 2.81008882e-01,
       2.01299186e+00, 9.58147039e-01, 5.02270626e-01, 7.73653686e-01,
       5.47163377e-02, 2.02664104e-01, 4.31642203e-01, 2.56412565e-01,
       1.70204703e-01, 2.99259702e-01, 8.03643779e-01, 8.29466459e-01,
       3.35539086e-01, 1.34244710e+00, 4.55025455e-01, 1.77948340e+00,
       9.04177598e-01, 3.29882164e-01, 2.67697028e-01, 5.33055303e-01,
       2.35690002e+00, 1.31380324e-01, 7.02260878e-02, 2.01596555e-01,
      

In [42]:
# Create dataframe of numeric variables using ravel() method
df1 = pd.DataFrame({'Prevelance (%)': prevelance.ravel(), 'Control Rating': control_rate.ravel(),
                     'Profits': profits.ravel() })

# Create dataframe for list of strings (countrys) - Split the list to have the first element as a column and the rest as 
# data 
data = list(zip(*[iter(df3)]))
df2 = pd.DataFrame(data[1:], columns=data[0])

# Create a variable listing both dataframes together
dataframes = [df3, df1]

# Concatenate the dataframes along the x-axis
result = pd.concat([df3, df1], axis=1, join='inner')
print(result)

   Countries  Years  Prevelance (%)  Control Rating  Profits
0      India   2000            12.8             2.4      196
1      India   2001             6.4             3.0       62
2      India   2002            12.7             0.5      248
3      India   2003             0.9             1.2       76
4      India   2004             1.3             4.0      397
..       ...    ...             ...             ...      ...
58   Myanmar   2016             7.1             4.5      100
59   Myanmar   2017             5.6             4.1      112
60   Myanmar   2018            10.6             4.3      402
61   Myanmar   2019             6.5             0.5      256
62   Myanmar   2020             9.9             1.5      375

[63 rows x 5 columns]


In [43]:
# Create dataframe of numeric variables using ravel() method
df1 = pd.DataFrame({'deaths': [deaths],
                    'rate': [rate],
                   'profits': [profits] })
                    
                    
                    #deaths.ravel(), 'rate': rate.ravel,
                     #'profits': profits.ravel() })


#print(df1)

# Create dataframe for list of strings (countrys) - Split the list to have the first element as a column and the rest as 
# data 
#data = list(zip(*[iter(country)]))
#df2 = pd.DataFrame(data[1:], columns=data[0])

# Create a variable listing both dataframes together
dataframes = [df3, df1]

# Concatenate the dataframes along the x-axis
result = pd.concat([df3, df1], axis=1, join='inner')
print(result)

  Countries  Years                                             deaths  \
0     India   2000  [8.7, 4.0, 1.0, 0.7, 10.9, 12.2, 8.3, 9.8, 7.5...   

                                                rate  \
0  [4.96, 3.9, 2.43, 2.11, 4.38, 0.44, 3.54, 3.94...   

                                             profits  
0  [196, 62, 248, 76, 397, 230, 196, 189, 400, 31...  


In [48]:
# list containing 61 strings of countrys (beginning with just 3)
df3 = pd.DataFrame({"countries": ["Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",
           "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",
           "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",
           "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",
          "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indonesia", 
          "Indonesia","Indonesia","Indonesia",
          "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indonesia", 
          "Indonesia","Indonesia","Indonesia", "Indonesia",
          "Ireland", "Ireland", "Ireland", "Ireland", "Ireland", "Ireland", "Ireland", "Ireland",
          "Ireland", "Ireland",
          "Ireland", "Ireland", "Ireland", "Ireland", "Ireland", "Ireland", "Ireland", "Ireland",
          "Ireland", "Ireland", "Ireland"],
           "years": [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
         2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020,
         2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
         2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020,
         2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
         2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020] })
                   
#print(df3)

# Use np.uniform() to create a range of random 60 floats to represent the deaths per 100000
death = rng.uniform(low=0.5, high=13.3, size=(63,))
# Round to 2 decimal places and assign new variable
deaths = death.round(1)



# Generate 60 random integers in the range 45-405 to represent profits (in billions)
profits = rng.integers(low=45, high=405, size=(63,))

In [8]:
rating1 = rng.uniform(low=0.5, high= 0.9, size=(63,) )
rate1 = rating1.round(1)

In [9]:
rating2 = rng.uniform(low=1.5, high= 1.9, size=(63,) )
rate2 = rating2.round(1)

In [10]:
x = rng.integers(low=2.5, high=4.8, size=(63,))
y = a*x**(a-1.)


In [11]:
# Use np.uniform() to create a range of random 60 floats to represent the rating of restrictions against smoking companies
rating = rng.uniform(low=0.01, high=4.99, size=(63,))
# Round to 2 decimal places and assign new variable
rate = rating.round(2)

In [12]:
a = 5. # shape
samples = 100
s = np.random.power(6, 63)
s

array([0.92170334, 0.97170683, 0.88231392, 0.90987407, 0.94253798,
       0.94253706, 0.9026884 , 0.93492449, 0.8099636 , 0.98863849,
       0.96764242, 0.74992656, 0.90656386, 0.84737262, 0.93271523,
       0.79797171, 0.78574231, 0.9863003 , 0.86224809, 0.93587577,
       0.988303  , 0.87468021, 0.95259339, 0.86516856, 0.99623041,
       0.83266812, 0.90609247, 0.85042284, 0.89627465, 0.86628784,
       0.96205491, 0.94939612, 0.73073491, 0.7851296 , 0.54511911,
       0.88284216, 0.95833985, 0.76598721, 0.99019291, 0.8516341 ,
       0.99017827, 0.94819151, 0.84119945, 0.53286211, 0.99956187,
       0.64827476, 0.8070711 , 0.94164924, 0.62373052, 0.71779273,
       0.92683596, 0.97573042, 0.79555651, 0.74514517, 0.98338944,
       0.78658196, 0.76074331, 0.74475996, 0.71090579, 0.98749334,
       0.92256666, 0.90031145, 0.96161944])

In [13]:
a = 63. # shape
samples = (2.5, 4.4)
s = rng.exponential(scale=0.6, size=63)
s

array([0.24025112, 0.42293008, 0.42561094, 0.10685883, 0.38869525,
       0.53425124, 0.63615477, 2.09151956, 0.24950802, 0.08695003,
       0.16879019, 0.11846049, 1.64122504, 0.32575474, 0.18438887,
       0.31198908, 1.34094755, 0.54420463, 0.81395315, 0.52756186,
       0.12763587, 0.15127475, 0.12587199, 0.17333026, 0.09867235,
       0.18020034, 0.7968866 , 0.01375632, 0.13379955, 0.71149551,
       0.16831487, 0.54767649, 0.24046995, 0.12582019, 2.96029113,
       0.99486465, 0.43837008, 1.14959672, 1.45415009, 1.24195596,
       0.06141651, 0.96184977, 0.78849913, 1.97730921, 0.08255622,
       0.05654937, 0.02015871, 0.47114765, 1.37347569, 0.53692721,
       0.37575651, 0.1674567 , 0.88966929, 1.08703287, 0.50889954,
       1.11941657, 0.38538824, 0.20301705, 0.00587117, 0.29280928,
       0.64312243, 0.45036594, 2.19550336])

In [14]:
# Create dataframe of numeric variables using ravel() method
df1 = pd.DataFrame({'deaths': deaths.ravel(), 'rate': rate.ravel(),
                     'profits': profits.ravel() })

# Create dataframe for list of strings (countrys) - Split the list to have the first element as a column and the rest as 
# data 
data = list(zip(*[iter(df3)]))
df2 = pd.DataFrame(data[1:], columns=data[0])

# Create a variable listing both dataframes together
dataframes = [df3, df1]

# Concatenate the dataframes along the x-axis
result = pd.concat([df3, df1], axis=1, join='inner')
print(result)

   countries  years  deaths  rate  profits
0      India   2000     8.7  1.39      269
1      India   2001     4.0  1.81      253
2      India   2002     1.0  2.88      368
3      India   2003     0.7  2.64      152
4      India   2004    10.9  1.78      369
..       ...    ...     ...   ...      ...
58   Myanmar   2016     3.5  2.56      358
59   Myanmar   2017     1.2  0.98       50
60   Myanmar   2018     5.7  3.89       87
61   Myanmar   2019     3.0  4.33      355
62   Myanmar   2020     1.7  1.58       74

[63 rows x 5 columns]


In [None]:
# Create dataframe of numeric variables using ravel() method
df1 = pd.DataFrame({'deaths': [deaths],
                    'rate': [rate],
                   'profits': [profits] })
                    
                    
                    #deaths.ravel(), 'rate': rate.ravel,
                     #'profits': profits.ravel() })


#print(df1)

# Create dataframe for list of strings (countrys) - Split the list to have the first element as a column and the rest as 
# data 
#data = list(zip(*[iter(country)]))
#df2 = pd.DataFrame(data[1:], columns=data[0])

# Create a variable listing both dataframes together
dataframes = [df3, df1]

# Concatenate the dataframes along the x-axis
result = pd.concat([df3, df1], axis=1, join='inner')
print(result)