## Import data

### EMA data from https://www.ema.europa.eu/en/medicines/download-medicine-data#rare-disease-(orphan)-designations-section

In [1]:
import numpy as np
import pandas as pd

ema_data = pd.read_excel('data/Medicines_output_orphan_designations.xlsx', header=8)
ema_data.sort_values(by='First published', inplace=True)
ema_data

Unnamed: 0,Medicine name,Active substance,Agency product number,Date of first decision,Disease / condition,EU designation number,Status of orphan designation,First published,Revision date,URL
494,,Etilefrine,,2002-11-13 01:00:00,Treatment of low-flow priapism,EU/3/02/122,Positive,2002-12-13 01:09:21,2013-06-25 16:15:00,https://www.ema.europa.eu/en/medicines/human/o...
65,,Recombinant glycoprotein gp350 of Epstein-Barr...,,2002-10-22 00:00:00,Prevention of post-transplantation lympho-prol...,EU/3/02/118,Positive,2002-12-17 01:09:00,2002-12-17 01:09:00,https://www.ema.europa.eu/en/medicines/human/o...
631,,Iodine (131I) chimeric IgG monoclonal antibody...,,2002-03-19 01:00:00,Treatment of renal-cell carcinoma,EU/3/02/095,Withdrawn,2003-01-06 02:00:00,2014-06-18 18:45:00,https://www.ema.europa.eu/en/medicines/human/o...
1054,"Cerepro,",Adenovirus-mediated herpes-simplex-virus thymi...,EMEA/H/C/000694,2002-02-06 01:00:00,Treatment of high-grade glioma with subsequent...,EU/3/01/083,Positive,2003-01-06 02:00:00,2016-08-15 18:40:00,https://www.ema.europa.eu/en/medicines/human/o...
2117,,nitisinone,,2002-03-13 01:00:00,Treatment of alkaptonuria,EU/3/02/096,Withdrawn,2003-01-06 02:00:00,2020-04-01 15:50:00,https://www.ema.europa.eu/en/medicines/human/o...
...,...,...,...,...,...,...,...,...,...,...
2284,,(+)-Epicatechin,,2020-06-26 00:00:00,Treatment of Becker muscular dystrophy,EU/3/20/2293,Positive,2020-09-23 15:45:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...
2285,,Sodium cromoglicate,,2020-06-26 00:00:00,Treatment of idiopathic pulmonary fibrosis,EU/3/20/2294,Positive,2020-09-23 15:45:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...
2286,,Lys40(NODAGA-68Ga)NH2-exendin-4,,2020-06-26 00:00:00,Diagnosis of insulinoma,EU/3/20/2295,Positive,2020-09-24 12:32:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...
2287,,axicabtagene ciloleucel,,2020-06-26 00:00:00,Treatment of marginal zone lymphoma,EU/3/20/2296,Positive,2020-09-24 14:40:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...


### How many 'Positives', 'Withdrawn', 'Negative?

In [2]:
len(ema_data[ema_data['Status of orphan designation'] == 'Positive'])

1698

In [3]:
len(ema_data[ema_data['Status of orphan designation'] == 'Withdrawn'])

543

In [4]:
len(ema_data[ema_data['Status of orphan designation'] == 'Negative'])

30

### How many 'NaT' Revision date entries
Entries with NaT revision date may not have been passed on to the EC (NaT means missing entry)

In [5]:
# Get this weird NaT value
NaT = type(ema_data.iloc[2288]['Revision date'])
len(ema_data[ema_data['Revision date'].isnull()])

209

### How many of those are positive (is this the missing fraction in comparison to the EU # matched dataset)

In [6]:
len(ema_data[ema_data['Status of orphan designation'] == 'Positive'][ema_data['Revision date'].isnull()])

  """Entry point for launching an IPython kernel.


204

### European comission data

In [7]:
ec_data = pd.read_csv('data/orphan.csv', header=2)
ec_data = ec_data.rename(columns={'EU #': 'EU designation number'})
ec_data

Unnamed: 0,EU designation number,Product,Indication,Sponsor,Designation date,Tradename - EU product # - Implemented on
0,EU/3/20/2351,Adeno-associated virus serotype 5 containing t...,Treatment of RDH12 mutation associated retinal...,MeiraGTx B.V.,19 Oct 2020,-
1,EU/3/20/2350,"Poly(oxy-1,2-ethanediyl), alpha-hydro-omega-me...",Treatment of hypoparathyroidism,Ascendis Pharma Bone Diseases A/S,19 Oct 2020,-
2,EU/3/20/2349,Miglustat,Treatment of neuronal ceroid lipofuscinosis,Theranexus S.A.S.,19 Oct 2020,-
3,EU/3/20/2348,"Poly(oxy-1,2-ethanediyl), alpha-(carboxymethyl...",Treatment of homocystinuria,Aeglea Biotherapeutics UK Limited,19 Oct 2020,-
4,EU/3/20/2347,Trehalose,Treatment of neuronal ceroid lipofuscinosis,Theranexus S.A.S.,19 Oct 2020,-
...,...,...,...,...,...,...
1735,EU/3/01/034,Gusperimus trihydrochloride,Treatment of Wegener’s granulomatosis,Nordic Group B.V.,29 Mar 2001,-
1736,EU/3/01/028,Inolimomab,Treatment of Graft versus Host Disease,Elsalys Biotech SA,05 Mar 2001,-
1737,EU/3/01/026,L-Lysine-N-acetyl-L-cysteinate,Treatment of cystic fibrosis,LABORATOIRES SMB SA,14 Feb 2001,-
1738,EU/3/00/013,Ethyl Eicosopentaenoate,Treatment of Huntington's disease,Amarin Neuroscience Limited,29 Dec 2000,-


### Crossref entries in EMA to EC using the EU designation number

In [8]:
matched_data = pd.merge(ema_data, ec_data, how='inner', on=['EU designation number'])
matched_data

Unnamed: 0,Medicine name,Active substance,Agency product number,Date of first decision,Disease / condition,EU designation number,Status of orphan designation,First published,Revision date,URL,Product,Indication,Sponsor,Designation date,Tradename - EU product # - Implemented on
0,,Etilefrine,,2002-11-13 01:00:00,Treatment of low-flow priapism,EU/3/02/122,Positive,2002-12-13 01:09:21,2013-06-25 16:15:00,https://www.ema.europa.eu/en/medicines/human/o...,Etilefrin,Treatment of low flow priapism,Laboratoires SERB,13 Nov 2002,-
1,,Recombinant glycoprotein gp350 of Epstein-Barr...,,2002-10-22 00:00:00,Prevention of post-transplantation lympho-prol...,EU/3/02/118,Positive,2002-12-17 01:09:00,2002-12-17 01:09:00,https://www.ema.europa.eu/en/medicines/human/o...,Recombinant glycoprotein gp350 of Epstein-Barr...,Prevention of post transplantation lympho-prol...,Henogen S.A.,22 Oct 2002,-
2,"Cerepro,",Adenovirus-mediated herpes-simplex-virus thymi...,EMEA/H/C/000694,2002-02-06 01:00:00,Treatment of high-grade glioma with subsequent...,EU/3/01/083,Positive,2003-01-06 02:00:00,2016-08-15 18:40:00,https://www.ema.europa.eu/en/medicines/human/o...,Adenovirus-mediated Herpes simplex Virus-thymi...,Treatment of high-grade glioma with subsequent...,Boyd Consultants Limited,06 Feb 2002,-
3,"Prohippur,","Benzoic acid, sodium salt",EMEA/H/C/004150,2002-09-11 00:00:00,Treatment of non-ketotic hyperglycinaemia,EU/3/02/111,Positive,2003-01-08 01:09:00,2003-01-08 01:09:00,https://www.ema.europa.eu/en/medicines/human/o...,"Benzoic acid, sodium salt",Treatment of non-ketotic hyperglycinaemia,Ethicare GmbH,11 Sep 2002,-
4,,Autologous Renal Cell Tumour Vaccine,,2002-10-21 00:00:00,Treatment of renal-cell carcinoma,EU/3/02/116,Positive,2003-01-08 01:09:00,2003-01-08 01:09:00,https://www.ema.europa.eu/en/medicines/human/o...,Autologous renal cell tumor vaccine,Treatment of renal cell carcinoma,Liponova GmbH,21 Oct 2002,-
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1663,,(+)-Epicatechin,,2020-06-26 00:00:00,Treatment of Becker muscular dystrophy,EU/3/20/2293,Positive,2020-09-23 15:45:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,(+)-epicatechin,Treatment of Becker muscular dystrophy,MWB Consulting S.A.R.L.,26 Jun 2020,-
1664,,Sodium cromoglicate,,2020-06-26 00:00:00,Treatment of idiopathic pulmonary fibrosis,EU/3/20/2294,Positive,2020-09-23 15:45:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,Sodium cromoglicate,Treatment of idiopathic pulmonary fibrosis,IQVIA RDS Spain S.L.,26 Jun 2020,-
1665,,Lys40(NODAGA-68Ga)NH2-exendin-4,,2020-06-26 00:00:00,Diagnosis of insulinoma,EU/3/20/2295,Positive,2020-09-24 12:32:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,Lys40(NODAGA-68Ga)NH2-exendin-4,Diagnosis of insulinoma,Stichting Katholieke Universiteit,26 Jun 2020,-
1666,,axicabtagene ciloleucel,,2020-06-26 00:00:00,Treatment of marginal zone lymphoma,EU/3/20/2296,Positive,2020-09-24 14:40:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,Axicabtagene ciloleucel,Treatment of marginal zone lymphoma,Kite Pharma EU B.V.,26 Jun 2020,-


### Check the orphan status of matched designations

In [9]:
matched_data['Status of orphan designation'].unique()

array(['Positive', 'Withdrawn'], dtype=object)

### How many are 'Withdrawn'

In [10]:
matched_data[matched_data['Status of orphan designation'] == 'Withdrawn']

Unnamed: 0,Medicine name,Active substance,Agency product number,Date of first decision,Disease / condition,EU designation number,Status of orphan designation,First published,Revision date,URL,Product,Indication,Sponsor,Designation date,Tradename - EU product # - Implemented on
1009,,Live attenuated Listeria monocytogenes delta a...,,2015-12-14 01:00:00,Treatment of malignant mesothelioma,EU/3/15/1594,Withdrawn,2016-02-03 17:30:00,2020-01-13 10:41:00,https://www.ema.europa.eu/en/medicines/human/o...,Live attenuated Listeria monocytogenes delta a...,Treatment of malignant mesothelioma,"Aduro Biotech Holdings, Europe B.V.",14 Dec 2015,-


### Get entries where 'Product' and 'Active substances' coincide

In [11]:
# Get active substance column

# Make all lower case to simplify matching
matched_data['Active substance'] = matched_data['Active substance'].str.lower()
matched_data['Product'] = matched_data['Product'].str.lower()

product_filtered_data = matched_data[matched_data['Active substance'] == matched_data['Product']]
product_filtered_data

Unnamed: 0,Medicine name,Active substance,Agency product number,Date of first decision,Disease / condition,EU designation number,Status of orphan designation,First published,Revision date,URL,Product,Indication,Sponsor,Designation date,Tradename - EU product # - Implemented on
1,,recombinant glycoprotein gp350 of epstein-barr...,,2002-10-22 00:00:00,Prevention of post-transplantation lympho-prol...,EU/3/02/118,Positive,2002-12-17 01:09:00,2002-12-17 01:09:00,https://www.ema.europa.eu/en/medicines/human/o...,recombinant glycoprotein gp350 of epstein-barr...,Prevention of post transplantation lympho-prol...,Henogen S.A.,22 Oct 2002,-
3,"Prohippur,","benzoic acid, sodium salt",EMEA/H/C/004150,2002-09-11 00:00:00,Treatment of non-ketotic hyperglycinaemia,EU/3/02/111,Positive,2003-01-08 01:09:00,2003-01-08 01:09:00,https://www.ema.europa.eu/en/medicines/human/o...,"benzoic acid, sodium salt",Treatment of non-ketotic hyperglycinaemia,Ethicare GmbH,11 Sep 2002,-
5,,thymalfasin,,2002-07-30 02:00:00,Treatment of hepatocellular carcinoma,EU/3/02/110,Positive,2003-01-08 01:09:21,2003-01-08 01:09:21,https://www.ema.europa.eu/en/medicines/human/o...,thymalfasin,Treatment of hepatocellular carcinoma,SciClone Pharmaceuticals Italy S.r.l,30 Jul 2002,-
8,"NexoBrid,",purified bromelain,EMEA/H/C/002246,2002-07-30 02:00:00,Treatment of partial deep dermal and full-thic...,EU/3/02/107,Positive,2003-01-08 02:00:00,2013-09-19 12:00:00,https://www.ema.europa.eu/en/medicines/human/o...,purified bromelain,Treatment of partial deep dermal and full thic...,MediWound Germany GmbH,30 Jul 2002,NexoBrid - \nEU/1/12/803 - \n20 Dec 2012
10,"Voraxaze,",carboxypeptidase g2,EMEA/H/C/000681,2003-02-03 01:00:00,Adjunctive treatment in patients at risk of me...,EU/3/02/128,Positive,2003-02-17 02:00:00,2020-04-02 09:20:00,https://www.ema.europa.eu/en/medicines/human/o...,carboxypeptidase g2,Adjunctive treatment in patients at risk of me...,Protherics Medicines Development Europe B.V.,03 Feb 2003,-
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1663,,(+)-epicatechin,,2020-06-26 00:00:00,Treatment of Becker muscular dystrophy,EU/3/20/2293,Positive,2020-09-23 15:45:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,(+)-epicatechin,Treatment of Becker muscular dystrophy,MWB Consulting S.A.R.L.,26 Jun 2020,-
1664,,sodium cromoglicate,,2020-06-26 00:00:00,Treatment of idiopathic pulmonary fibrosis,EU/3/20/2294,Positive,2020-09-23 15:45:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,sodium cromoglicate,Treatment of idiopathic pulmonary fibrosis,IQVIA RDS Spain S.L.,26 Jun 2020,-
1665,,lys40(nodaga-68ga)nh2-exendin-4,,2020-06-26 00:00:00,Diagnosis of insulinoma,EU/3/20/2295,Positive,2020-09-24 12:32:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,lys40(nodaga-68ga)nh2-exendin-4,Diagnosis of insulinoma,Stichting Katholieke Universiteit,26 Jun 2020,-
1666,,axicabtagene ciloleucel,,2020-06-26 00:00:00,Treatment of marginal zone lymphoma,EU/3/20/2296,Positive,2020-09-24 14:40:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,axicabtagene ciloleucel,Treatment of marginal zone lymphoma,Kite Pharma EU B.V.,26 Jun 2020,-


### Get entries where 'Product' and 'Active substance' do not match

In [12]:
# Get active substance column
product_filtered_data = matched_data[matched_data['Active substance'] != matched_data['Product']]
product_filtered_data

Unnamed: 0,Medicine name,Active substance,Agency product number,Date of first decision,Disease / condition,EU designation number,Status of orphan designation,First published,Revision date,URL,Product,Indication,Sponsor,Designation date,Tradename - EU product # - Implemented on
0,,etilefrine,,2002-11-13 01:00:00,Treatment of low-flow priapism,EU/3/02/122,Positive,2002-12-13 01:09:21,2013-06-25 16:15:00,https://www.ema.europa.eu/en/medicines/human/o...,etilefrin,Treatment of low flow priapism,Laboratoires SERB,13 Nov 2002,-
2,"Cerepro,",adenovirus-mediated herpes-simplex-virus thymi...,EMEA/H/C/000694,2002-02-06 01:00:00,Treatment of high-grade glioma with subsequent...,EU/3/01/083,Positive,2003-01-06 02:00:00,2016-08-15 18:40:00,https://www.ema.europa.eu/en/medicines/human/o...,adenovirus-mediated herpes simplex virus-thymi...,Treatment of high-grade glioma with subsequent...,Boyd Consultants Limited,06 Feb 2002,-
4,,autologous renal cell tumour vaccine,,2002-10-21 00:00:00,Treatment of renal-cell carcinoma,EU/3/02/116,Positive,2003-01-08 01:09:00,2003-01-08 01:09:00,https://www.ema.europa.eu/en/medicines/human/o...,autologous renal cell tumor vaccine,Treatment of renal cell carcinoma,Liponova GmbH,21 Oct 2002,-
6,,tgf-ß2-specific phosphorothioate antisense oli...,,2002-03-22 02:00:00,Treatment of high-grade glioma,EU/3/02/091,Positive,2003-01-08 02:00:00,2014-04-03 16:11:00,https://www.ema.europa.eu/en/medicines/human/o...,tgf-beta2 specific phosphorothioate antisense ...,Treatment of high-grade glioma,Dr Ulrich Granzer,22 Mar 2002,-
7,,chimeric igg monoclonal antibody cg250 (girent...,,2002-03-19 01:00:00,Treatment of renal-cell carcinoma,EU/3/02/094,Positive,2003-01-08 02:00:00,2017-11-22 18:50:00,https://www.ema.europa.eu/en/medicines/human/o...,chimeric igg monoclonal antibody cg250,Treatment of renal cell carcinoma,Heidelberg Pharma AG,19 Mar 2002,-
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1564,,sodium benzoate,,2019-06-28 00:00:00,Treatment of argininosuccinic aciduria,EU/3/19/2178,Positive,2019-10-14 15:25:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,"sodium benzoate, sodium phenylacetate",Treatment of argininosuccinic aciduria,Dipharma B.V.,28 Jun 2019,-
1565,,sodium benzoate,,2019-06-28 00:00:00,Treatment of hyperargininaemia,EU/3/19/2179,Positive,2019-10-14 15:40:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,"sodium benzoate, sodium phenylacetate",Treatment of hyperargininaemia,Dipharma B.V.,28 Jun 2019,-
1592,,autologous peripheral blood t cells cd4 and cd...,,2019-11-13 01:00:00,Treatment of mantle cell lymphoma,EU/3/19/2220,Positive,2020-02-13 09:56:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,autologous peripheral blood t cells cd4 and cd...,Treatment of mantle cell lymphoma,Kite Pharma EU B.V.,13 Nov 2019,-
1616,,combination of three adeno-associated viral ve...,,2020-02-28 01:00:00,Treatment of inherited retinal dystrophies,EU/3/20/2254,Positive,2020-05-06 11:30:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,combination of three adeno-associated viral ve...,Treatment of inherited retinal dystrophies,Fondazione Telethon,28 Feb 2020,-


They seem to be identical! Differnences are probably just documentation discrepancies.

In [13]:
matched_data

Unnamed: 0,Medicine name,Active substance,Agency product number,Date of first decision,Disease / condition,EU designation number,Status of orphan designation,First published,Revision date,URL,Product,Indication,Sponsor,Designation date,Tradename - EU product # - Implemented on
0,,etilefrine,,2002-11-13 01:00:00,Treatment of low-flow priapism,EU/3/02/122,Positive,2002-12-13 01:09:21,2013-06-25 16:15:00,https://www.ema.europa.eu/en/medicines/human/o...,etilefrin,Treatment of low flow priapism,Laboratoires SERB,13 Nov 2002,-
1,,recombinant glycoprotein gp350 of epstein-barr...,,2002-10-22 00:00:00,Prevention of post-transplantation lympho-prol...,EU/3/02/118,Positive,2002-12-17 01:09:00,2002-12-17 01:09:00,https://www.ema.europa.eu/en/medicines/human/o...,recombinant glycoprotein gp350 of epstein-barr...,Prevention of post transplantation lympho-prol...,Henogen S.A.,22 Oct 2002,-
2,"Cerepro,",adenovirus-mediated herpes-simplex-virus thymi...,EMEA/H/C/000694,2002-02-06 01:00:00,Treatment of high-grade glioma with subsequent...,EU/3/01/083,Positive,2003-01-06 02:00:00,2016-08-15 18:40:00,https://www.ema.europa.eu/en/medicines/human/o...,adenovirus-mediated herpes simplex virus-thymi...,Treatment of high-grade glioma with subsequent...,Boyd Consultants Limited,06 Feb 2002,-
3,"Prohippur,","benzoic acid, sodium salt",EMEA/H/C/004150,2002-09-11 00:00:00,Treatment of non-ketotic hyperglycinaemia,EU/3/02/111,Positive,2003-01-08 01:09:00,2003-01-08 01:09:00,https://www.ema.europa.eu/en/medicines/human/o...,"benzoic acid, sodium salt",Treatment of non-ketotic hyperglycinaemia,Ethicare GmbH,11 Sep 2002,-
4,,autologous renal cell tumour vaccine,,2002-10-21 00:00:00,Treatment of renal-cell carcinoma,EU/3/02/116,Positive,2003-01-08 01:09:00,2003-01-08 01:09:00,https://www.ema.europa.eu/en/medicines/human/o...,autologous renal cell tumor vaccine,Treatment of renal cell carcinoma,Liponova GmbH,21 Oct 2002,-
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1663,,(+)-epicatechin,,2020-06-26 00:00:00,Treatment of Becker muscular dystrophy,EU/3/20/2293,Positive,2020-09-23 15:45:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,(+)-epicatechin,Treatment of Becker muscular dystrophy,MWB Consulting S.A.R.L.,26 Jun 2020,-
1664,,sodium cromoglicate,,2020-06-26 00:00:00,Treatment of idiopathic pulmonary fibrosis,EU/3/20/2294,Positive,2020-09-23 15:45:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,sodium cromoglicate,Treatment of idiopathic pulmonary fibrosis,IQVIA RDS Spain S.L.,26 Jun 2020,-
1665,,lys40(nodaga-68ga)nh2-exendin-4,,2020-06-26 00:00:00,Diagnosis of insulinoma,EU/3/20/2295,Positive,2020-09-24 12:32:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,lys40(nodaga-68ga)nh2-exendin-4,Diagnosis of insulinoma,Stichting Katholieke Universiteit,26 Jun 2020,-
1666,,axicabtagene ciloleucel,,2020-06-26 00:00:00,Treatment of marginal zone lymphoma,EU/3/20/2296,Positive,2020-09-24 14:40:00,NaT,https://www.ema.europa.eu/en/medicines/human/o...,axicabtagene ciloleucel,Treatment of marginal zone lymphoma,Kite Pharma EU B.V.,26 Jun 2020,-


### Import dataset of all drugs from https://www.ema.europa.eu/en/medicines/download-medicine-data (EPAR)

In [35]:
all_drugs_data = ema_data = pd.read_excel('data/all_drugs_ema.xlsx', header=8)
print(all_drugs_data['Category'].unique())
print(all_drugs_data['Orphan medicine'].unique())
all_drugs_data

['Veterinary' 'Human']


Unnamed: 0,Category,Medicine name,Therapeutic area,International non-proprietary name (INN) / common name,Active substance,Product number,Patient safety,Authorisation status,ATC code,Additional monitoring,...,Vet pharmacotherapeutic group,Date of opinion,Decision date,Revision number,Condition / indication,Species,ATCvet code,First published,Revision date,URL
0,Veterinary,Frontpro (previously known as Afoxolaner Merial),,afoxolaner,afoxolaner,EMEA/V/C/005126,no,Authorised,,no,...,Ectoparasiticides for systemic use,2019-03-21 01:00:00,2020-11-06 01:00:00,3.0,Treatment of flea (Ctenocephalides felis and C...,Dogs,QP53BE01,2019-06-07 17:00:00,2020-11-12 18:18:00,https://www.ema.europa.eu/en/medicines/veterin...
1,Human,Cholib,Dyslipidemias,"fenofibrate, simvastatin","fenofibrate, simvastatin",EMEA/H/C/002559,no,Authorised,C10BA04,no,...,,2013-06-27 00:00:00,2020-10-23 00:00:00,12.0,Cholib is indicated as adjunctive therapy to d...,,,2018-08-20 00:00:00,2020-11-12 17:50:00,https://www.ema.europa.eu/en/medicines/human/E...
2,Human,Repaglinide Krka,"Diabetes Mellitus, Type 2",repaglinide,repaglinide,EMEA/H/C/001066,no,Authorised,A10BX02,no,...,,2009-07-23 00:00:00,2020-10-28 01:00:00,6.0,Repaglinide is indicated in patients with type...,,,2017-10-27 00:00:00,2020-11-12 17:30:00,https://www.ema.europa.eu/en/medicines/human/E...
3,Human,Liprolog,Diabetes Mellitus,insulin lispro,insulin lispro,EMEA/H/C/000393,no,Authorised,"A10AB04, A10AD04",no,...,,2001-04-26 00:00:00,2020-09-04 00:00:00,28.0,For the treatment of adults and children with ...,,,2017-10-23 00:00:00,2020-11-12 16:54:00,https://www.ema.europa.eu/en/medicines/human/E...
4,Human,Hexacima,"Hepatitis B, Tetanus, Immunization, Meningitis...","diphtheria, tetanus, pertussis (acellular, com...","diphtheria toxoid / tetanus toxoid, two-compon...",EMEA/H/C/002702,no,Authorised,J07CA09,no,...,,2013-02-21 01:00:00,2020-09-24 00:00:00,21.0,Hexacima (DTaP-IPV-HB-Hib) is indicated for pr...,,,2018-01-08 12:30:00,2020-11-12 16:42:00,https://www.ema.europa.eu/en/medicines/human/E...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1730,Human,Liprolog,Diabetes Mellitus,insulin lispro,insulin lispro,EMEA/H/C/000143,no,Withdrawn,A10AB04,no,...,,NaT,2001-02-19 01:00:00,0.0,For the treatment of patients with diabetes me...,,,2001-02-19 01:00:00,2001-08-01 00:00:00,https://www.ema.europa.eu/en/medicines/human/E...
1731,Human,EchoGen,Echocardiography,dodecafluoropentane,dodecafluoropentane,EMEA/H/C/000149,no,Withdrawn,V08DA,no,...,,NaT,2001-01-22 01:00:00,0.0,EchoGen is a transpulmonary echocardiographic ...,,,2001-01-22 01:00:00,2001-05-22 00:00:00,https://www.ema.europa.eu/en/medicines/human/E...
1732,Human,Ecokinase,Myocardial Infarction,reteplase,reteplase,EMEA/H/C/000106,no,Withdrawn,B01AD07,no,...,,NaT,1999-07-30 00:00:00,0.0,Thrombolytic therapy of acute myocardial infar...,,,1999-07-30 00:00:00,2000-12-12 01:00:00,https://www.ema.europa.eu/en/medicines/human/E...
1733,Human,Primavax,"Hepatitis B, Tetanus, Immunization, Diphtheria","diphtheria, tetanus and hepatitis B vaccine, a...","diphtheria toxoid purified, hepatitis B, recom...",EMEA/H/C/000156,no,Withdrawn,J07CA,no,...,,NaT,2000-07-27 00:00:00,0.0,This vaccine is indicated for active immunizat...,,,2000-07-27 00:00:00,2000-12-04 01:00:00,https://www.ema.europa.eu/en/medicines/human/E...


### Keep only the human drugs

In [36]:
all_drugs_data = all_drugs_data[all_drugs_data['Category'] == 'Human']
all_drugs_data = all_drugs_data.drop(columns=['Category', 'Species', 'Vet pharmacotherapeutic group', 'Additional monitoring', 'Patient safety', 'Generic', 'Biosimilar', 'ATCvet code', 'ATC code'])
all_drugs_data

Unnamed: 0,Medicine name,Therapeutic area,International non-proprietary name (INN) / common name,Active substance,Product number,Authorisation status,Conditional approval,Exceptional circumstances,Accelerated assessment,Orphan medicine,...,Date of refusal of marketing authorisation,Marketing authorisation holder/company name,Human pharmacotherapeutic group,Date of opinion,Decision date,Revision number,Condition / indication,First published,Revision date,URL
1,Cholib,Dyslipidemias,"fenofibrate, simvastatin","fenofibrate, simvastatin",EMEA/H/C/002559,Authorised,no,no,no,no,...,NaT,Mylan IRE Healthcare Ltd,Lipid modifying agents,2013-06-27 00:00:00,2020-10-23 00:00:00,12.0,Cholib is indicated as adjunctive therapy to d...,2018-08-20 00:00:00,2020-11-12 17:50:00,https://www.ema.europa.eu/en/medicines/human/E...
2,Repaglinide Krka,"Diabetes Mellitus, Type 2",repaglinide,repaglinide,EMEA/H/C/001066,Authorised,no,no,no,no,...,NaT,"Krka, d.d., Novo mesto","Drugs used in diabetes,",2009-07-23 00:00:00,2020-10-28 01:00:00,6.0,Repaglinide is indicated in patients with type...,2017-10-27 00:00:00,2020-11-12 17:30:00,https://www.ema.europa.eu/en/medicines/human/E...
3,Liprolog,Diabetes Mellitus,insulin lispro,insulin lispro,EMEA/H/C/000393,Authorised,no,no,no,no,...,NaT,Eli Lilly Nederland B.V.,"Drugs used in diabetes,",2001-04-26 00:00:00,2020-09-04 00:00:00,28.0,For the treatment of adults and children with ...,2017-10-23 00:00:00,2020-11-12 16:54:00,https://www.ema.europa.eu/en/medicines/human/E...
4,Hexacima,"Hepatitis B, Tetanus, Immunization, Meningitis...","diphtheria, tetanus, pertussis (acellular, com...","diphtheria toxoid / tetanus toxoid, two-compon...",EMEA/H/C/002702,Authorised,no,no,no,no,...,2013-02-22 01:00:00,Sanofi Pasteur,"Vaccines, , Bacterial and viral vaccines, comb...",2013-02-21 01:00:00,2020-09-24 00:00:00,21.0,Hexacima (DTaP-IPV-HB-Hib) is indicated for pr...,2018-01-08 12:30:00,2020-11-12 16:42:00,https://www.ema.europa.eu/en/medicines/human/E...
5,Semglee,Diabetes Mellitus,insulin glargine,insulin glargine,EMEA/H/C/004280,Authorised,no,no,no,no,...,NaT,Mylan S.A.S,"Drugs used in diabetes,",2018-01-25 01:00:00,2020-10-07 00:00:00,5.0,"Treatment of diabetes mellitus in adults, adol...",2018-04-03 12:07:00,2020-11-12 13:51:00,https://www.ema.europa.eu/en/medicines/human/E...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1730,Liprolog,Diabetes Mellitus,insulin lispro,insulin lispro,EMEA/H/C/000143,Withdrawn,no,no,no,no,...,NaT,Eli Lilly and Company Limited,"Drugs used in diabetes, , Insulins and analogu...",NaT,2001-02-19 01:00:00,0.0,For the treatment of patients with diabetes me...,2001-02-19 01:00:00,2001-08-01 00:00:00,https://www.ema.europa.eu/en/medicines/human/E...
1731,EchoGen,Echocardiography,dodecafluoropentane,dodecafluoropentane,EMEA/H/C/000149,Withdrawn,no,no,no,no,...,NaT,Sonus Pharmaceuticals Ltd.,Contrast media,NaT,2001-01-22 01:00:00,0.0,EchoGen is a transpulmonary echocardiographic ...,2001-01-22 01:00:00,2001-05-22 00:00:00,https://www.ema.europa.eu/en/medicines/human/E...
1732,Ecokinase,Myocardial Infarction,reteplase,reteplase,EMEA/H/C/000106,Withdrawn,no,no,no,no,...,NaT,Roche Registration Ltd.,"Antithrombotic agents,",NaT,1999-07-30 00:00:00,0.0,Thrombolytic therapy of acute myocardial infar...,1999-07-30 00:00:00,2000-12-12 01:00:00,https://www.ema.europa.eu/en/medicines/human/E...
1733,Primavax,"Hepatitis B, Tetanus, Immunization, Diphtheria","diphtheria, tetanus and hepatitis B vaccine, a...","diphtheria toxoid purified, hepatitis B, recom...",EMEA/H/C/000156,Withdrawn,no,no,no,no,...,NaT,Pasteur Mà¨rieux MSD,"Vaccines,",NaT,2000-07-27 00:00:00,0.0,This vaccine is indicated for active immunizat...,2000-07-27 00:00:00,2000-12-04 01:00:00,https://www.ema.europa.eu/en/medicines/human/E...


### Keep only orphan drugs

In [38]:
all_drugs_data = all_drugs_data[all_drugs_data['Orphan medicine'] == 'yes']
all_drugs_data

Unnamed: 0,Medicine name,Therapeutic area,International non-proprietary name (INN) / common name,Active substance,Product number,Authorisation status,Conditional approval,Exceptional circumstances,Accelerated assessment,Orphan medicine,...,Date of refusal of marketing authorisation,Marketing authorisation holder/company name,Human pharmacotherapeutic group,Date of opinion,Decision date,Revision number,Condition / indication,First published,Revision date,URL
14,Xermelo,"Carcinoid Tumor, Neuroendocrine Tumors",telotristat ethyl,telotristat etiprate,EMEA/H/C/003937,Authorised,no,no,no,yes,...,NaT,Ipsen Pharma,Other alimentary tract and metabolism products,2017-07-19 00:00:00,2020-11-09 01:00:00,11.0,Xermelo is indicated for the treatment of carc...,2018-04-19 12:01:00,2020-11-11 18:16:00,https://www.ema.europa.eu/en/medicines/human/E...
30,Zolgensma,"Muscular Atrophy, Spinal",onasemnogene abeparvovec,onasemnogene abeparvovec,EMEA/H/C/004750,Authorised,yes,no,no,yes,...,NaT,Novartis Gene Therapies EU Limited,Other drugs for disorders of the musculo-skele...,2020-03-26 01:00:00,2020-10-15 00:00:00,1.0,"Zolgensma is indicated for the treatment of:, ...",2020-05-27 15:00:00,2020-11-09 17:56:00,https://www.ema.europa.eu/en/medicines/human/E...
36,Ayvakyt,Gastrointestinal Stromal Tumors,avapritinib,avapritinib,EMEA/H/C/005208,Authorised,no,no,no,yes,...,NaT,Blueprint Medicines (Netherlands) B.V.,"Other antineoplastic agents, Protein kinase in...",2020-07-23 00:00:00,2020-10-15 00:00:00,1.0,Ayvakyt is indicated as monotherapy for the tr...,2020-09-30 09:30:00,2020-11-06 17:28:00,https://www.ema.europa.eu/en/medicines/human/E...
40,Zejula,"Fallopian Tube Neoplasms, Peritoneal Neoplasms...",niraparib,Niraparib (tosylate monohydrate),EMEA/H/C/004249,Authorised,no,no,no,yes,...,NaT,GlaxoSmithKline (Ireland) Limited,"Antineoplastic agents,",2017-09-14 00:00:00,2020-10-27 01:00:00,11.0,"Zejula is indicated:, , as monotherapy for the...",2017-12-19 17:47:00,2020-11-06 15:49:00,https://www.ema.europa.eu/en/medicines/human/E...
43,Defitelio,Hepatic Veno-Occlusive Disease,defibrotide,defibrotide,EMEA/H/C/002393,Authorised,no,yes,no,yes,...,NaT,Gentium S.r.l.,"Antithrombotic agents,",2013-07-25 00:00:00,2020-10-14 00:00:00,12.0,Defitelio is indicated for the treatment of se...,2017-04-21 12:12:00,2020-11-05 17:16:00,https://www.ema.europa.eu/en/medicines/human/E...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1585,Istodax,"Lymphoma, Non-Hodgkin",romidepsin,romidepsin,EMEA/H/C/002122,Refused,no,no,no,yes,...,2013-02-12 01:00:00,Celgene Europe Ltd.,"Antineoplastic agents,",2012-11-15 01:00:00,2012-07-20 00:00:00,0.0,"treatment of peripheral T-cell lymphoma (PTCL),",2012-07-20 15:02:00,2013-03-14 10:08:00,https://www.ema.europa.eu/en/medicines/human/E...
1590,Elelyso,Gaucher Disease,taliglucerase alfa,Taliglucerase alfa,EMEA/H/C/002250,Refused,no,no,no,yes,...,2012-10-25 00:00:00,Pfizer Ltd.,,2012-07-03 00:00:00,2012-10-29 01:00:00,0.0,"treatment of Gaucher disease,",2012-10-29 13:07:00,2012-11-23 15:34:00,https://www.ema.europa.eu/en/medicines/human/E...
1610,Folotyn,"Lymphoma, T-Cell",pralatrexate,Pralatrexate,EMEA/H/C/002096,Refused,no,no,no,yes,...,2012-06-21 00:00:00,Allos Therapeutics Ltd,"Antineoplastic agents,",2012-04-19 00:00:00,2012-01-20 01:00:00,0.0,"treatment of peripheral T-cell lymphoma,",2012-01-20 13:03:00,2012-07-11 10:05:00,https://www.ema.europa.eu/en/medicines/human/E...
1652,Sovrima,Friedreich Ataxia,idebenone,idebenone,EMEA/H/C/000908,Refused,no,no,no,yes,...,2009-04-27 00:00:00,Centocor B.V.,"Psychoanaleptics,",2008-11-20 01:00:00,2009-12-31 01:00:00,0.0,Treatment of Friedreichâs Ataxia,2009-12-31 02:00:00,2009-12-31 02:00:00,https://www.ema.europa.eu/en/medicines/human/E...


### How many authorised orphan drugs are there?

In [39]:
len(all_drugs_data[all_drugs_data['Authorisation status'] == 'Authorised'])

111

There are way fewer orphan drugs than orphan drug designations (Active substance)

### How many unique Active substances?

In [42]:
unique_active_substances = matched_data['Active substance'].unique()
len(unique_active_substances)

1373

### How many designation does each substance have?

In [45]:
names = []
number_designations = []
for substance in unique_active_substances:
    mask = matched_data['Active substance'] == substance
    designations = matched_data['Active substance'][mask]
    number = len(designations)
    
    if number > 1:
        names.append(substance)
        number_designations.append(number)
        
df_number_designations = pd.DataFrame({'Active Substance': names, 'Number designations': number_designations})
df_number_designations

Unnamed: 0,Active Substance,Number designations
0,miltefosine,3
1,antisense oligonucleotide (tatccggagggctcgccat...,3
2,recombinant antibody derivative against human ...,2
3,n-(methyl-diazacyclohexyl-methylbenzamide)-aza...,3
4,midostaurin,2
...,...,...
183,"6,8-bis(benzylthio)octanoic acid",2
184,rozanolixizumab,2
185,allogeneic cultured postnatal thymus-derived t...,3
186,"sodium benzoate, sodium phenylacetate",2


### What is the maximum number of designations

In [46]:
df_number_designations['Number designations'].unique()

array([ 3,  2,  4,  6,  5,  7,  9, 12])

In [47]:
df_number_designations[df_number_designations['Number designations'] == 12]

Unnamed: 0,Active Substance,Number designations
144,sodium benzoate,12
