This is an analysis of tumor incidence in 7 global regions (WHO regions, with America split into USA/Canada and Latin America) from data in the CI5-Xd database.

First, we'll need to import some libraries.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Next, we'll define some locations where the data are. The registry list should be in CSV format. The columns list:

    (1) A numeric code denoting the identity of the registry, corresponding to the name of the CSV file with data from that registry.
    (2) A text description of the registry.
    (3) A numeric code listing which region the registry corresponds to: 
         1- 'AFR'
         2- 'AMR-L'
         3- 'AMR-US'
         4- 'EMR'
         5- 'WPR'
         6- 'SEAR'
         7- 'EUR'
    (4) A numeric code denoting the WorldBank income level of the country the registry covers:
        1- Low income
        2- Lower middle income
        3- Higher middle income
        4- High income
    (5) A numeric flag denoting whether to use the registry in the calculation. 0- Don't use. 1- Use.

In [2]:
regionList = ['AFR', 'AMR-L', 'AMR-US/C', 'EMR', 'WPR', 'SEAR', 'EUR']
dataDir = './CI5-Xd/CSV/'
registryList = './CI5-Xd/registryCSV.csv'

Import the registry list as a dataframe. Skip the 1st row, and add our own descriptions for the columns.

In [3]:
df = pd.read_csv(registryList, skiprows=1, names=['Reg #', 'Description', 'Region', 'Income', 'Use'])
df.iloc[1:9]

Unnamed: 0,Reg #,Description,Region,Income,Use
1,14340199,"Libya, Benghazi (2003-2005)",1,3,0
2,14540199,"Malawi, Blantyre (2003-2007)",1,1,0
3,17100199,"South Africa, PROMEC (2003-2007)",1,3,0
4,17160270,"Zimbabwe, Harare: African (2003-2006)",1,1,0
5,17880299,"Tunisia, North (2003-2005)",1,2,0
6,18000299,"Uganda, Kyadondo County (2003-2007)",1,1,1
7,18180299,"Egypt, Gharbiah (2003-2007)",1,2,0
8,20320199,"Argentina, BahÌa Blanca (2003-2007)",2,3,1


Define what age range and tumor types we'll use. 

Group codes are indexed 1-19, corresponding to ages 0-4,5-9,10-14,15-19,...,80-84,85+,Unknown.

Sex is coded 1/male, 2/female.

Tumors are described in cancer.txt. Relevant codes are:

    173 Eye, brain and central nervous system (C69-72)
    174 Eye (C69)
    175 	Retinoblastoma
    176 	Melanoma
    177 	Squamous cell carcinoma
    178 	Other specified carcinoma
    179 	Unspecified carcinoma
    180 	Sarcoma
    181 	Other morphology
    182 	Unspecified morphology
    183 Meninges (C70)
    184 Central nervous system (C71-72)
    185 	Astrocytic tumours
    186 	Oligodendroglial tumours and mixed gliomas
    187 	Ependymal tumours
    188 	Gliomas of uncertain origin
    189 	Medulloblastoma
    190 	Other embryonal tumours
    191 	Other neuroepithelial tumours
    192 	Other specified morphology
    193 	Unspecified morphology
    194 Brain (C71)
    195 Other parts of central nervous system (C72)
    196 	Spinal cord, cauda equina (C72.0,1)
    197 	Cranial nerves (C72.2-5)
    198 	Nervous system, NOS (C72.8-9)

In [8]:
useAges   = [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18] # Use ages 15-85+
useSexes  = [1,2]                                    # Use both male and female
useTumors = [183, 184]                               # Meninges (C70), CNS (C71-72)

In [5]:
# Loop through all the regions we need to analyze
for regionN in range(1, 8):
    print(regionList[regionN-1])
    regFrame = df[np.logical_and(df['Region'] == regionN, df['Use'] > 0)]
    
    regFrame.size
    
    
    
    
    

AFR
AMR-L
AMR-US/C
EMR
WPR
SEAR
EUR


In [6]:
regionList

['AFR', 'AMR-L', 'AMR-US/C', 'EMR', 'WPR', 'SEAR', 'EUR']

In [7]:
regFrame.size()

TypeError: 'numpy.int64' object is not callable