# Ecotox Data Documentation

Essa é uma documentação inicial sobre a base de dados Ecotox.

Com sua manipulação, é esperado entender mais sobre:

- Quais combinações usadas em testes causam malefícios a determinadas famílias (ou outro ranking taxonomico)
- Prever os efeitos com base nos materias e procedimentos

O objetivo deste estudo é evitar ou minimizar a necessidade de teste em animais.

Fonte dos dados:

https://cfpub.epa.gov/ecotox/


ECOTOX Support

U.S. Environmental Protection Agency

Office of Research and Development

National Health and Environmental Effects Research Laboratory Mid-Continent Ecology Division (MED)

6201 Congdon Boulevard

Duluth, Minnesota 55804

Telephone: 218-529-5225

Fax: 218-529-5003

E-mail: ecotox.support@epa.gov

## Tabela de dados

- TESTS

o Information pertaining to the experimental design.

- CHEMICAL_CARRIERS

o Information pertaining to the carrier and/or positive control chemicals
reported for the test. 

- RESULTS

o Information pertaining to the endpoint or non-endpoint result or dose- response summary (also NR endpoint).

- MEDIA_CHARACTERISTICS

o Water chemistry and media characteristics parameters. 1-1 relation with
RESULTS table. 

- DOSES

o Information pertaining to the dose-response dose. 

- DOSE_RESPONSES

o Parent dose response record containing sample size, effect measurement, response site, observation duration, etc.

- DOSE_RESPONSE_DETAILS

o Detail dose response record, one for each response value by dose.

- DOSE_RESPONSE_LINKS

o Ties dose response to its NR endpoint result summary record.


In [2]:
import pandas as pd
import numpy as np

A tabela ```doses``` é ligada à de testes pelo ```test_id```

In [3]:
df_dose = pd.read_csv("ecotox_ascii_12_13_2018/doses.txt",sep="|", dtype='unicode')
df_dose.loc[:5, ['test_id', 'dose_id', 'dose_conc_unit', 'dose1_mean', 'dose2_mean', 'dose3_mean']]

Unnamed: 0,test_id,dose_id,dose_conc_unit,dose1_mean,dose2_mean,dose3_mean
0,1,1,mg/kg,0,,
1,1,2,mg/kg,5,,
2,1,3,mg/kg,25,,
3,1,4,mg/kg,125,,
4,2,5,mg/kg,0,,
5,2,6,mg/kg,150,,


A tabela de ```dose_respose``` apresenta os efeitos dos testes e os códigos são explicados por ```effect_codes```

In [4]:
df_dose_responses = pd.read_csv("ecotox_ascii_12_13_2018/dose_responses.txt",sep="|",dtype='unicode')
df_dose_responses.loc[:5, ['test_id', 'effect_code']]

Unnamed: 0,test_id,effect_code
0,1,ENZ
1,1,ENZ
2,1,HRM
3,2,ENZ
4,2,ENZ
5,2,ENZ


In [5]:
df_effect_codes = pd.read_csv("ecotox_ascii_12_13_2018/validation/effect_codes.txt",sep="|",dtype='unicode')
df_effect_codes.loc[:5]

Unnamed: 0,code,description
0,--,Unspecified
1,ACC,Accumulation
2,AVO,Avoidance
3,BCM,Biochemistry
4,BEH,Behavior
5,CEL,Cell(s)


A tabela de ```results``` apresenta os efeitos dos testes de acordo com o ```test_id```, que é a chave principal da base de dados e é ligada a família dos animais por ```tests```

In [6]:
df_results = pd.read_csv("ecotox_ascii_12_13_2018/results.txt",sep="|", dtype='unicode')
df_results.loc[:5, ['test_id', 'effect']]

Unnamed: 0,test_id,effect
0,1143197,MOR
1,1047376,POP
2,1152742,POP
3,1101244,ITX
4,1210976,CEL
5,1035811,MLT


In [10]:
df_tests = pd.read_csv("ecotox_ascii_12_13_2018/tests.txt",sep="|", dtype='unicode')
df_tests.loc[:5, ['test_id', 'species_number']]

Unnamed: 0,test_id,species_number
0,1,4906
1,2,4906
2,3,4510
3,4,4435
4,5,4435
5,6,4435


Um exemplo de manipulação simples feita para gerar os dados de forma mais consistente e consica é dado abaixo:

In [12]:
df_ds = pd.read_csv("dataset_dose.csv")
df_ds.loc[:20]

Unnamed: 0,test_id,family,chemical,dose,effect
0,13,Megascolecidae,2-Propanone,1.0,Accumulation
1,22,Muridae,Corn oil,1.0,Mortality
2,22,Muridae,Corn oil,2.0,Mortality
3,23,Canidae,Corn oil,0.5,Mortality
4,23,Canidae,Corn oil,1.0,Mortality
5,23,Canidae,Corn oil,1.5,Mortality
6,50,Mustelidae,2-Propanone,0.0,Feeding behavior
7,50,Mustelidae,2-Propanone,0.0,Growth
8,50,Mustelidae,2-Propanone,0.0,Hormone(s)
9,50,Mustelidae,2-Propanone,0.0,Morphology
