Skip to content

Teaching datasets for Lecture Series on Open Reproducible Science in R

Notifications You must be signed in to change notification settings

OxfordIHTM/teaching_datasets

Repository files navigation

Teaching datasets for Lecture Series on Open Reproducible Science in R

License for data

This repository contains a collection of teaching datasets that can be used for teaching R. Although this repository was created specifically for use in teaching the Open Reproducible Science in R lecture series, the data made available through here is open for access and use by anyone and is distributed under a Creative Commons 1.0 Universal (CC0) license.

Dataset description

File Name File Type File Description Epidemiology/Statistics Usage
ba.dat DAT The dataset from Bland JM, Altman DG. Statistical Methods for Assessing Agreement Between Two Methods of Clinical Measurement. The Lancet. 1986;1: 307–310. Useful for learning and practicing how to perform statistical methods to compare diagnostic tests using the Bland and Altman approach and the Bland and Altman plot
bateman.dat DAT On Saturday, 21st April 1990, a luncheon was held in the home of Jean Bateman. There was a total of forty-five guests which included thirty-five members of the Department of Epidemiology and Population Sciences at the London School of Hygiene and Tropical Medicine. On Sunday morning, 22nd April 1990, Jean awoke with symptoms of gastrointestinal illness; her husband awoke with similar symptoms. The possibility of an outbreak related to the luncheon was strengthened when several of the guests telephoned Jean on Sunday and reported illness. On Monday, 23rd April 1990, there was an unusually large number of department members absent from work and reporting illness. Data from this outbreak is stored in this dataset. Useful for learning and practicing how to perform logistic regression
ca.dat DAT A dataset on the survival of cancer patients in two different treatment groups Useful for learning and practicing how to perform survival anlaysis
cover.dat DAT The dataset contains data from a coverage survey for a therapeutic feeding program (TFP) in central Malawi undertaken in March 2003. Data were collected using the centric systematic area sampling method to define sampling locations: A number of communities located closest to the centres of thirty 10 x 10 kilometre grid squares were sampled using active (investigative) case-finding. Useful for learning and practicing how to analyse survey data and how to perform basic spatial analysis
diets.dat DAT The dataset contains data from a trial of two different diets undertaken at an adult therapeutic feeding centre in Somalia. Useful for learning and practicing statistical tests to show difference in mean between two groups
fem.dat DAT A dataset from 118 female pyschiatric patients Useful for learning and practicing various statistical tests, linear regression, logistic regression, and linear modelling
fem.xlsx XLSX A dataset from 118 female pyschiatric patients Useful for learning and practicing various statistical tests, linear regression, logistic regression, and linear modelling
gudhiv.dat DAT This data is from a cross-sectional study of 435 male patients who presented with sexually transmitted infections at an outpatient clinic in The Gambia between August 1988 and June 1990. Useful for learning and practicing logistic regression
koko_plus_coverage.csv CSV Dataset from a coverage survey of Koko+ in Eastern Ghana Useful for learning how to analyse survey data, perform basic spatial analysis, and perform comparative analysis for evaluating programme performance
malaria.dat DAT A dataset that contains data on rainfall (in mm) and the number of cases of malaria reported from health centres in an administrative district of Ethiopia between July 1997 and July 1999 Usefule for learning and practicing time series analysis and plotting
nut.dat DAT Useful for learning and practicing how to analyse survey data
octe.dat DAT This data is from a matched case-control study investigating the association between oral contraceptive use and thromboembolism. The cases are 175 women aged between 15 and 44 years admitted to hospital for thromboembolism and discharged alive. The controls are female patients admitted for conditions believed to be unrelated to oral contraceptive use. Cases and controls were matched on age, ethnic group, marital status, parity, income, place of residence, and date of hospitalisation. Useful for learning and practicing how to perform analysis for a matched cases-control study
pop.dat DAT A dataset that contains data on the age (in months) and sex of 438 children aged between six and sixty months collected as part of a nutritional anthropometry survey of the Khosh Valley in Northeast Afghanistan. Useful for learning and practicing how to create various plots including a population pyramid plot
salex.dat DAT This data comes from a food-borne outbreak. On Saturday 17th October 1992, eighty-two people attended a buffet meal at a sports club. Within fourteen to twenty-four hours, fifty-one of the participants developed diarrhoea, with nausea, vomiting, abdominal pain and fever. Useful for learning and practicing how to perform analysis for relative risk and odds ratios
school_nutrition.csv CSV A dataset from a nutrition survey of school children 10 years and older from Pakistan. Useful for learning how to analyse survey data
school_nutrition.xlsx XLSX A dataset from a nutrition survey of school children 10 years and older from Pakistan. Useful for learning how to analyse survey data
south_wollo_coverage.csv CSV A dataset from a Community-based Management of Acute Malnutrition (CMAM) programme in South Wollo Zone, Ethiopia Useful for learning how to analyse survey data and perform basic spatial analysis
sssw.dat DAT This dataset contains data on the marital status, home circumstances, and ethnic group of 152 persons recruited into a study into the levels of stress experienced by student social workers in the United Kingdom. Useful for learning how to analyse survey data and using various plots for exploratory data analysis
tsstamp.dat DAT This data is from a matched case-control study investigating the association between the use of different brands of tampon and toxic shock syndrome undertaken during an outbreak. Only a subset of the original dataset is used here. Useful for learning and practicing how to perform logistic regression and stratified analysis
waste.dat DAT The dataset contains the location of twenty-three recent cases of childhood cancer in 5 by 5 km square surrounding an industrial waste disposal site. Useful for learning and practicing computer simulation to test spatial clustering
whz.dat DAT

About

Teaching datasets for Lecture Series on Open Reproducible Science in R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published