# Chemical Products Analysis

In this notebook I'm going to look into the Chemical products category from RAPEX. I will use all the events from 2005 to 2018.

## Required imports

In [142]:
import requests
import pandas as pd
import codecs
import urllib.parse
import html5lib
from bs4 import BeautifulSoup
import plotly.offline as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

init_notebook_mode(connected=True)
pd.set_option('display.max_colwidth', 300)

## Configuration

In [143]:
category = "Chemical products"
start_year = 2005
end_year = 2018
data_file = 'data/rapex_data.xls'

## Data download

In [144]:
print("Executing search...")
session = requests.Session()
category = urllib.parse.quote_plus(category)
rapex_search_url = "https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=search.jsonData&search_year={}&productCategory={}&{}"
sorting = "iSortCol_0=1&sSortDir_0=desc&iSortingCols=1"
years = "%2C".join([str(i) for i in range(start_year, end_year + 1)])
search_url = rapex_search_url.format(years, category, sorting)
session.get(search_url)

print("Downloading Excel data...")
rapex_excel_url = "https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=main.search.Excel"
result = session.get(rapex_excel_url)
content = result.text

print("Saving data to {}...".format(data_file))
with codecs.open(data_file, 'w') as f:
    f.write(content)
print("Data download complete.")

Executing search...
Downloading Excel data...
Saving data to data/rapex_data.xls...
Data download complete.


## Load the data into a Pandas dataframe

In [145]:
html_table = pd.read_html(content, index_col=False)
df = pd.concat(html_table)
header = df.iloc[0]
df = df[1:]
df.columns = header

df.head()

Unnamed: 0,Year,Week,Risk level,Product user,Alert number,Alert submitted by,Category,Product,Brand,Name,...,Description,Country of origin,Counterfeit,Risk type,Technical defect,Risk,Measures adopted by notifying country,Products were found and measures were taken also in,Company recall page,URL of Case
1,2018,32,Serious risk,Consumer,A12/1174/18,Italy,Chemical products,Drain cleaner,ZAPEC,DISGORGANTE RAPIDO ZAPEC,...,750 ml of liquid for cleaning and unblocking drains in a yellow plastic bottle with blue cap.,Italy,,Chemical,"The safety cap easily opens even with the safety seals. The product contains sulphuric acid (concentration 94% in volume), which is corrosive and causes irritation by contact, ingestion or inhalation.","A person could easily come into contact with the product leading to skin, eye and lung irritation and burns. The product does not comply with the Classification, Labelling and Packaging (CLP) regulation.","=""Measures ordered by public authorities(to: Manufacturer): Withdrawal of the product from the market""",,,https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=A12/1174/18
2,2018,32,Serious risk,Consumer,A12/1162/18,Croatia,Chemical products,Tattoo ink,World famous tattoo ink,Black Sabbath,...,"Black tattoo ink, in a 30 ml plastic bottle with screw top.",United States,,Chemical,The product contains barium (measured value: 92 mg/kg).,"Salts of barium can be absorbed from the tattoo ink and have toxic effects. The Council of Europe Resolution ResAP (2008)1 on requirements and criteria for the safety of tattoos and permanent make-up, recommends that the level of barium does not exceed 50 mg/kg.","=""Measures ordered by public authorities(to: Retailer): Ban on the marketing of the product and any accompanying measures""",,,https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=A12/1162/18
3,2018,31,Serious risk,Consumer,A12/1122/18,Croatia,Chemical products,Tattoo ink,Eternal Ink,Bright Orange,...,"Bright Orange ink, plastic bottle with srcew top, 30 ml, MPG date 03.2017, exp. date 03. 2020,",United States,,Chemical,The product contains lead (measured value: 14.7 mg/kg) and barium (measured value: 75 mg/kg).,Exposure to lead is harmful for human health and can cause developmental neurotoxicity. Salts of barium can be absorbed from the tattoo ink and have toxic effects. The Council of Europe Resolution ResAP (2008)1 on requirements and criteria for the safety of tattoos and permanent make-up recommen...,"=""Measures ordered by public authorities(to: Retailer): Ban on the marketing of the product and any accompanying measures""",,,https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=A12/1122/18
4,2018,31,Serious risk,Consumer,A12/1120/18,Croatia,Chemical products,Tattoo ink,Intenze,Bright Red,...,Bright red tattoo ink in a plastic bottle with screw top.,United States,,Chemical,"The ink contains cadmium (measured value: 0.62 mg/kg), mercury (measured value: 0.32 mg/kg) and barium (measured value: 62 mg/kg).",Exposure to lead is harmful for human health and can cause developmental neurotoxicity. Salts of barium can be absorbed from the tattoo ink and have toxic effects. The Council of Europe Resolution ResAP (2008)1 on requirements and criteria for the safety of tattoos and permanent make-up recommen...,"=""Measures ordered by public authorities(to: Distributor): Ban on the marketing of the product and any accompanying measures""",,,https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=A12/1120/18
5,2018,30,Serious risk,Consumer,A12/1007/18,Cyprus,Chemical products,Glue,Unknown,Superglue,...,A yellow and black tube of glue in a black blister pack.,China,,Chemical,The product contains too much Chloroform (measured concentration 39% in weight).,Chloroform in high doses causes skin irritation and can damage health if inhaled or swallowed. The product does not comply with REACH.,"=""Measures ordered by public authorities(to: Retailer): Withdrawal of the product from the market""",,,https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=A12/1007/18


### Columns in the dataset

In [146]:
list(df)

['Year',
 'Week',
 'Risk level',
 'Product user',
 'Alert number',
 'Alert submitted by',
 'Category',
 'Product',
 'Brand',
 'Name',
 'Type / number of model',
 'Batch number / Barcode',
 'OECD Portal Category',
 'Description',
 'Country of origin',
 'Counterfeit',
 'Risk type',
 'Technical defect',
 'Risk',
 'Measures adopted by notifying country',
 'Products were found and measures were taken also in',
 'Company recall page',
 'URL of Case']

## Initial Data analysis

### Number of samples per year

Let's start by looking at the distribution of the data by year.

In [147]:
year_counts = df['Year'].value_counts().rename_axis('Year').reset_index(name='Count')
year_counts.sort_values(by=['Year'])

Unnamed: 0,Year,Count
13,2005,5
12,2006,15
11,2007,18
10,2008,21
6,2009,44
9,2010,29
8,2011,38
1,2012,57
2,2013,56
0,2014,64


Most samples are from the year 2014.

In [148]:
year_counts_plot = [go.Bar(x=year_counts['Year'], y=year_counts['Count'])]
py.iplot({'data': year_counts_plot,
                  'layout': {
            'title': 'Data distribution by Year',
            'xaxis': {
                 'title': 'Year'},
            'yaxis': {
                'title': 'Count '}
        }})

### Product distribution

An important insight is to know which products are represented the most in the dataset.

In [149]:
df['Product'] = df['Product'].str.lower()
product_counts = df['Product'].value_counts().rename_axis('Product').reset_index(name='Count')
product_counts.sort_values(by=['Count'], ascending=False)

print("The most represented products are:")
product_counts.head(10)

The most represented products are:


Unnamed: 0,Product,Count
0,tattoo ink,169
1,glue,44
2,liquid for e-cigarettes,24
3,valve oil for musical instruments,16
4,tattoo or permanent make-up ink,12
5,super glue,9
6,rubber solution in bicycle repair kit,7
7,bicycle tyre repair kit,6
8,liquid for electronic cigarettes,6
9,poppers,5


### Plot the distribution

In [150]:
product_counts_plot = [go.Bar(x=product_counts['Product'], y=product_counts['Count'])]
py.iplot({'data': product_counts_plot,
                  'layout': {
            'title': 'Data distribution by Product',
            'xaxis': {
                 'title': 'Product'},
            'yaxis': {
                'title': 'Count '}
        }})

### Risk level

In [151]:
df['Risk level'].value_counts().rename_axis('Risk').reset_index(name='Count')

Unnamed: 0,Risk,Count
0,Serious risk,503
1,Other risk level,44


## Risk analysis

Let's look at the risks which were the actual cause of the alert.

In [152]:
df['Risk'].value_counts().rename_axis('Risk').reset_index(name='Count').sort_values(by=['Count'], ascending=False).head(10)

Unnamed: 0,Risk,Count
0,"Aromatic amines can cause cancer, cell mutations and affect reproduction. The Council of Europe Resolution ResAP (2008)1 on requirements and criteria for the safety of tattoos and permanent make-up, recommends that aromatic amines with carcinogenic, mutagenic, reprotoxic or sensitising propertie...",6
1,"The product lacks the required labelling, child-resistant fastening and tactile warning of danger. Children could accidentally swallow some of the product leading to aspiration toxicity including chemical pneumonia. The product does not comply with the Regulation on the classification, labelling...",6
2,"The product poses a chemical risk because the benzene content is higher than that permitted under the Chemical Restriction Directive 76/769/EEC. Given that benzene is classified as a Category I carcinogen, the risk is posed by contact and inhalation.",5
3,"The product has an aspiration hazard and lacks the required labelling, child-resistant fastening and tactile warning of danger. Children could accidentally swallow some of the product leading to aspiration toxicity including chemical pneumonia. The product does not comply with the Regulation on ...",4
4,Chloroform in high doses causes skin irritation and can damage health if inhaled or swallowed. The product does not comply with the REACH Regulation.,3
5,The product poses a chemical risk because it contains isobutyl nitrite which is classified as a carcinogen cat. 2 (and mutagen cat. 3) and must not be provided to the general public.,3
6,"Some PAHs are carcinogenic, including benzo(a)pyrene. The Council of Europe Resolution ResAP (2008)1 on requirements and criteria for the safety of tattoos and permanent make-up, recommends that the level of benzo(a)pyrene (BaP) does not exceed 0.005 mg/kg and the total amount of PAHs does not e...",3
7,"The product poses a chemical risk because it contains Aniline, which is carcinogenic. In addition, the product may cause local reactions in the tissue, which might be related to the content of primary aromatic amines (PAA) and thus azo-dyes.",3
16,"The product poses a chemical risk because it contains chloroform, a category-3 toxic and carcinogenic product, at a level of over 10%. The product does not comply with REACH regulation.",2
23,"The product lacks the appropriate labelling and warning of danger as required for hazardous substances (skin sensitising, skin irritating and eye irritating). The product does not comply with the Regulation on the classification, labelling and packaging of substances and mixtures (CLP).",2


## Cancer Risk

### How many risks are related to cancer

In [198]:
df['Cancer'] = 0
df['Risk'] = df['Risk'].str.lower()
df.loc[df["Risk"].str.contains('cancer'), 'Cancer'] = 1
df.loc[df["Risk"].str.contains('carcinogen'), 'Cancer'] = 1
df.loc[df["Risk"].str.contains('carcinogenic'), 'Cancer'] = 1
cancer_counts = df['Cancer'].value_counts()
cancer_counts

0    363
1    184
Name: Cancer, dtype: int64

In [199]:
labels = ['Cancer Related','Other']
values = [cancer_counts[1],cancer_counts[0]]
trace = go.Pie(labels=labels, values=values)
py.iplot([trace], filename='basic_pie_chart')

Roughly 1/3 of the risks are cancer related.

### Which of the products are related to cancer risk: tatoo, glue, cleaning products or electronic cigarettes

In [218]:
tatoo_products = df.loc[df['Product'].str.contains('tattoo', na=False),"Cancer"]
glue_products = df.loc[df['Product'].str.contains('glue', na=False),"Cancer"]
e_cigarettes = df.loc[df['Product'].str.contains('e-cigarettes|electronic\scigarettes|electronic\scigarette', na=False),"Cancer"]
cleaning_products = df.loc[df['Product'].str.contains('cleaner|cleaning', na=False),"Cancer"]

tatoo_products_counts = tatoo_products.value_counts()
glue_products_counts = glue_products.value_counts()
ecig_products_counts = e_cigarettes.value_counts()
cleaning_products_counts = cleaning_products.value_counts()

# values were not found
ecig_products_counts[1] = 0
cleaning_products_counts[1] = 0

trace1 = go.Bar(
    x=['tatoo products', 'glue products', 'e-cigarette products', 'cleaning products'],
    y=[tatoo_products_counts[1], glue_products_counts[1], ecig_products_counts[1], cleaning_products_counts[1]],
    name='Cancer'
)
trace2 = go.Bar(
    x=['tatoo products', 'glue products', 'e-cigarette products', 'cleaning products'],
    y=[tatoo_products_counts[0], glue_products_counts[0], ecig_products_counts[0], cleaning_products_counts[0]],
    name='Other Risk'
)

data = [trace1, trace2]
layout = go.Layout(
    barmode='stack'
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='stacked-bar')

### Verify that there were no e-cigarette cancer related alerts

In [223]:
e_cig_risks = df.loc[df['Product'].str.contains('e-cigarettes|electronic\scigarettes|electronic\scigarette', na=False),'Risk']
e_cig_risks.str.contains('carcinogen|cancer').value_counts()

False    37
Name: Risk, dtype: int64

### Examples of e-cigarette risks

In [224]:
pd.DataFrame(e_cig_risks.sample(8))

Unnamed: 0,Risk
223,"the packaging lacks a clear reference to the presence of nicotine (measured value 1.11% by weight ) and does not contain an adequate safety label bearing risk-related indications, safety advice or a tactile danger warning. the user therefore has no information on the dangers incurred when the pr..."
233,"the products contain more than 1% of nicotine (8.70 mg/ml, wrongly labelled as 6 mg/ml; 18.91 mg/ml, wrongly labelled as 18 mg; and 25.17 mg/ml, wrongly labelled 24 mg/ml). the products do not contain an adequate safety label bearing risk-related indications, safety advice or a tactile danger wa..."
234,"the products pose a chemical risk because: 1) it contains more than 1% of nicotine (wrongly labelled as 18 mg/ml, it actually contains 20.36 mg/ml) and does not contain an adequate safety label bearing risk-related indications, safety advice or a tactile danger warning. the product does not comp..."
192,"the product contains nicotine (0.2%) yet the presence of nicotine is not adequately reported on the labelling. the bottle does not have a child-resistant safety closure, and it does not contain an adequate safety label bearing risk-related indications, safety advice or a tactile danger warning. ..."
365,"the product poses a chemical risk because it contains nicotine dosed at 1.2%, which falls within the ""toxic"" category when the preparation comes into contact with the skin. however, the packaging does not have a safety label (no pictogram, risk-related indications, advice, tactile danger warning..."
350,"the label of the product states that it does not contain nicotine, but the test results show that it contains about 8mg/g of this substance. the product poses very serious risks since the consumer could be misled and consider it as a non-nicotine product. vulnerable consumers and consumers with ..."
240,"the packaging lacks a clear reference to the presence of nicotine (1.82%), the product is not equipped with a child-resistant fastening and does not contain an adequate safety label bearing risk-related indications, safety advice or a tactile danger warning. the user therefore has no information..."
182,"the products contain more than 1% of nicotine (12.87 mg/ml) and do not have an adequate safety label bearing risk-related indications, safety advice or recommendations for correct and safe use of the product. the user therefore has no information to avoid the dangers incurred when the product co..."


Conclusion: Because e-cigarettes deliver nicotine without the tar and many of the other cancer-linked chemicals found in tobacco, they’re thought to pose less of a cancer risk than traditional cigarettes.

In [225]:
cleaning_risks = df.loc[df['Product'].str.contains('cleaning|cleaner', na=False),'Risk']
cleaning_risks.str.contains('carcinogen|cancer').value_counts()

False    26
Name: Risk, dtype: int64

### Examples of cleaning product risks

In [227]:
pd.DataFrame(cleaning_risks.sample(8))

Unnamed: 0,Risk
227,"the product pose a risk of fire and burns because the package of the product does not warn consumers that the product contains highly flammable liquid. consequently, the consumer is not properly informed about the flammability of this mixture. the product does not comply with the clp regulation ..."
104,"the product is corrosive. as the tactile warning of danger, required for these kind of preparations, and the name of the substances as well as the precautionary statements are not indicated, the consumer has no information on a safe use of the product and the dangers incurred when the product co..."
54,"the gel is corrosive but lacks the appropriate hazard pictograms and child-resistant fastening, required for these kind of products. users have therefore no information on the dangers incurred when the product comes into contact with the skin or if it is ingested. the product does not comply wit..."
356,the product poses a chemical risk because it contains a basic solution with over 4% of sodium hydroxide.- the product must therefore be classified as “corrosive”. there is no pictogram “c” on the label and there is incorrect labelling of risk and safety warnings. the packaging has no child-resis...
292,the product poses a chemical risk because it contains 10% by weight of n-methyl-2-pyrrolidone. the product does not comply with the reach regulation.
213,"the product is corrosive. since it lacks important safety information and warnings, it may be misused and cause dangerous accidents. the product does not comply with the regulation (ec) no. 1272/2008 on classification, labelling and packaging of substances and mixtures."
462,"the product poses a chemical risk because it contains a basic solution with ph value 12.8. due to the high ph value, the product is classified as c, r35 ""corrosive"". additionally, the container does not bear the necessary warnings and instructions for safe use of the product. the product does no..."
527,there is a risk of damage to sight since the product is not provided with child-resistant closure (fastenings) and tangible symbol indicating the danger. due to the above risk the product poses serious risk to people with visual impairment and 3-11 year children. this product does not comply wit...


## Countries

'Alert submitted by' contains the name of the country where the risk was submitted

In [221]:
df['Alert submitted by'].value_counts().rename_axis('user').reset_index(name='Count')

Unnamed: 0,user,Count
0,Germany,123
1,Italy,104
2,France,56
3,The Netherlands,55
4,Spain,41
5,Sweden,33
6,Lithuania,26
7,Cyprus,19
8,United Kingdom,16
9,Denmark,14


### Countries of origin

In [222]:
df['Country of origin'].value_counts().rename_axis('user').reset_index(name='Count')

Unnamed: 0,user,Count
0,United States,173
1,China,136
2,Unknown,46
3,Germany,26
4,Italy,24
5,United Kingdom,20
6,France,14
7,Poland,12
8,The Netherlands,11
9,Taiwan,9
