# Brian S Caffo
<hr>

| Professor |
|:---  |
| [Departments of Biostatistics](https://publichealth.jhu.edu/departments/biostatistics) [Johns Hopkins University](www.jhu.edu) (primary),|
| [Department of Biomedical Engineering](https://www.bme.jhu.edu/), [Johns Hopkins University](www.jhu.edu) (courtesy) |
| [www.bcaffo.com](www.bcaffo.com), [CV repo](https://github.com/bcaffo/cv), [CV hosted version](https://bcaffo.github.io/cv/cvJupyter.html) |

# Part I
## Summary
Brian Caffo, PhD is a professor in the Department of Biostatistics
with a secondary appointment in the Department of Biomedical
Engineering at Johns Hopkins University.  He graduated from the
University of Florida Department of Statistics in 2001. He has worked
in statistical computing, statistical modeling, computational
statistics, multivariate and decomposition methods and statistics in
neuroimaging and neuroscience. He led teams that won the ADHD 200
prediction competition. He co-directs the SMART statistical
group. With other faculty at JHU, he created and co-directs the
Coursera Data Science Specialization, a 10 course specialization on
statistical data analysis. He co-directs the JHU Data Science Lab, a
group dedicated to open educational innovation and data science. He is
the former director of the Biostatistics graduate programs and
admissions committees. He is currently the co-director of the Johns
Hopkins High Performance Computing Exchange super computing service
center and past-president of the Bloomberg School of Public Health
faculty senate.

## Education and training

| Year | Description | Institution | |
|:---  |:---         | :--- | :--- |
| 2006 | K25 training grant | NIH | *A mentored training program in imaging science* |
| 2001 | PhD in statistics | U of Florida | *Candidate sampling schemes and some important applications* |
| 1998 | MS in statistics|  U of Florida | |
| 1995 | Dual BS in mathematics and statistics | U of Florida | |


In [1]:
!pip install -U kaleido &> /dev/null

In [3]:
import pandas as pd
import plotly.express as px
import numpy as np
#import wordcloud as wc
#import stylecloud as sc
import matplotlib.pyplot as plt
import os
from IPython.display import Image
import plotly.graph_objects as go
import plotly.io as pio
import itertools
from PIL import Image
import kaleido

## pio.renderers.default = "plotly_mimetype+notebook"
## pio.renderers.default = "plotly_mimetype+notebook_connected"
## This allows for pdf rendering
## pio.renderers.default = "plotly_mimetype+notebook+pdf"
## pio.kaleido.scope.mathjax = None

#static = True
static = False

#the default height and width
height = 400
width = 600

## Note to render to pdf do
## quarto render cvJupyter.qmd --to pdf
pd.set_option("display.max_rows", 999)
#dat = pd.read_csv("publications_01042022_2.csv")
#dat = pd.read_csv("publications_12072022.csv")
#dat = pd.read_csv("https://raw.githubusercontent.com/bcaffo/cv/master/publications_01152024.csv")
dat = pd.read_csv("https://raw.githubusercontent.com/bcaffo/cv/refs/heads/master/publications_01282025.csv")

## Not sure why these columns changes.
## Here's the ones you need, reset these every year
dat = dat.rename(columns = {
    'Authors' : 'Authors',
    'Year' : 'Publication Year',
    'Title'        : 'Document Title',
    'Source title' : 'Journal Title',
    'Cited by'     : 'Citations'
})
dat['Citations'] = dat['Citations'].fillna(0)

## Professional experience
Relevant professional experience.


In [4]:
profExp = pd.read_csv("https://raw.githubusercontent.com/bcaffo/cv/master/profExp.txt", delimiter="|")
profExp['Start'] = pd.to_datetime("01/01/"+profExp['Start'].astype(str))
profExp['End'] = pd.to_datetime("12/31/"+profExp['End'].astype(str))
profExp = profExp.sort_values(by = ['Start', 'End'])
profExp = profExp.assign(Position=profExp['Title']+" "+profExp['Place'])


fig = px.timeline(profExp, x_start="Start", x_end="End",
                  y='Position',
                  color="Organization",
                  height= 400)
fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_yaxes(autorange="reversed")

fig.show(warn = False)

In [None]:
profExp.drop(['Unnamed: 0'], axis = 1)

Unnamed: 0,Start,End,Title,Place,Organization,Position
10,1996-01-01,1999-12-31,Research assistant for professor Alan Agresti,Depart. of Stat.,UFL,Research assistant for professor Alan Agresti ...
11,1996-01-01,1999-12-31,Intern / programmer,Pediatric Oncology Group,UFL,Intern / programmer Pediatric Oncology Group
9,2001-01-01,2007-12-31,Assistant professor,Dept. of Biostat,JHU,Assistant professor Dept. of Biostat
8,2007-01-01,2013-12-31,Associate professor,Dept. of Biostat,JHU,Associate professor Dept. of Biostat
6,2011-01-01,2025-12-31,Founding co-director,SMART research group,JHU,Founding co-director SMART research group
7,2013-01-01,2025-12-31,Full professor,Dept. of Biostat,JHU,Full professor Dept. of Biostat
5,2014-01-01,2025-12-31,Co-founding member,Johns Hopkins Data Science Lab,JHU,Co-founding member Johns Hopkins Data Science Lab
3,2016-01-01,2025-12-31,Faculty member,Kavli Neuroscience Discovery Institute,JHU,Faculty member Kavli Neuroscience Discovery In...
4,2017-01-01,2025-12-31,Faculty member,Malone Center for Engineering and Healthcare,JHU,Faculty member Malone Center for Engineering a...
1,2019-01-01,2025-12-31,Secondary,Dept. of Biomedical Engineering,JHU,Secondary Dept. of Biomedical Engineering


## Professional activities

| Year | Activity |
| :--- | :---     |
| 2005-2006 |Publications Officer for the Biometrics Section of the American Statistical Association |
| 2010 | Founding member Stat in Imaging ASA Section |
| 2010-2011 | Secretary Stat in Imaging ASA Section |





## Editorial activities

| Year | Activity |  
| :--- | :---     |
| 2006-2008 | Associate editor Computational Statistics and Data Analysis |
| 2008-2010 | Associate editor for the Journal of the American Statistical Association |
| 2009-2012 | Associate editor for the Journal of the Royal Statistical Society Series B |
| 2010-2012 | Associate editor for Biometrics
| 2011-2011 | Senior program committee member for the Fourteenth International Conference on Artificial Intelligence and Statistics|
| 2016-2016 | Guest associate editor for Frontiers in Neuroscience special issues on Brain Imaging Methods
| 2021-2021 | Guest associate editor for Frontiers special issue on Explainable Artificial Intelligence in Healthcare and Finance |
| 2024-2024 | Editorial board member Data Science in Science |
| 2024-2025 | Guest associate editor for Frontiers special issue on Integrating Data Science with Organoid Research for Advanced Biocomputing |  

Here are my NIH study sections. I do not include internal, NSF or EU study sections, of which I've done a small number.


In [None]:
grant_reviews = pd.read_table("https://raw.githubusercontent.com/bcaffo/cv/master/nih_reviews.txt", sep = ",")
grant_reviews

Unnamed: 0,Study section,Title,Date
0,NIGMS,Sure-First,3/25/2024
1,DNDA,ANIE,10/26/2023
2,ZDA,Workshops on Computational and Analytical Res...,1/30/2023
3,ZEY,NEI Clinical Applications,7/15/2022
4,ZEY,Large Scale Epidemiology and Secondary Data A...,10/20/2021
5,ZRG1,Healthcare Delivery and Methodologies,6/27/2018
6,ZNS1,NeuroNEXT2,3/23/2018
7,NPAS,Neural Basis of Psychopathology Addictions an...,10/19/2017
8,ZMH1,Interventions/Biomarkers Special Emphasis Panel,2/3/2017
9,ZMH1,Research Education Programs (R25),10/13/2016


## Honors and awards

| Year | Award |
| :--- | :--- |
| 1998 | William S. Mendenhall Award |
| 1999 | Anderson Scholar/Faculty nominee for the University of  Florida CLAS |
| 2001 | University of Florida CLAS Dissertation Fellowship |
| 2001 | University of Florida Statistics Faculty Award |
| 2002 | Johns Hopkins Faculty Innovation Award |
| 2006 | Johns Hopkins Bloomberg School of Public Health AMTRA award |
| 2008 | Johns Hopkins Bloomberg School of Public Health Golden Apple teaching award |
| 2011 | Leader and organizer of the declared winning entry of the 2011 ADHD200 prediction competition |
| 2011 | Presidential Early Career Award for Scientists and Engineers (PECASE, 2010, awarded in 2011); *The highest honor bestowed by the United States government on science and engineering professionals in the early stages of their independent research careers* |
| 2014 | Named a Fellow of the American Statistical Association |
| 2015 | Special Invited Lecturer, European Meeting of Statisticians |
| 2022 | Adrienne Cupples award; *This annual award recognizes a biostatistician whose academic achievements reflect the contributions to teaching, research, and service exemplified by [Professor L. Adrienne Cupples](https://www.bu.edu/sph/news/articles/2022/l-adrienne-cupples-in-memoriam/)*  
| 2024 | Johns Hopkins Bloomberg School of Public Health AMTRA award |

## Publications

Publications reported in Scopus as of 1/30/2025. My total number of Scopus publications is 269. Below is a plot of total publications by year where each small rectangle is a publication.


In [None]:
## Create a temporary copy of the dataset and work with that
temp = dat
temp = temp.assign(Count = 1)


fig = px.bar(temp, x = 'Publication Year',
                 y = 'Count',
                 color = 'Document Title',
                 hover_data = ['Publication Year', 'Document Title', 'Journal Title', 'Authors', 'Citations'])


fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)

fig.show()

In [None]:
fig.to_html("publications.html")

Output hidden; open in https://colab.research.google.com to view.

Here are journals I publish in the most.


In [None]:
temp = dat['Journal Title'].value_counts().reset_index()
temp = temp.rename(columns = { "count" : "Count"}).sort_values("Count", ascending =False)

temp['inplot'] = temp['Count'] > 5

temp = temp.merge(dat, left_on = 'Journal Title',  right_on = 'Journal Title', how = 'left')
temp = temp[temp['Count'] > 5]
temp = temp.assign(Count = 1)



fig = px.bar(temp, x = 'Journal Title',
                 y = 'Count',
                 color = 'Document Title',
                 hover_data = ['Publication Year', 'Document Title', 'Journal Title', 'Authors', 'Citations'])
fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)

fig.show()

I have published with 793 coauthors (according to Scopus). Here is authors that I have had 7 or over manuscripts with.


In [None]:
## Get just the author last names. Have to strip out the initials
text = [s.split(';') for s in dat['Authors']]
text = list(itertools.chain(*text))
authors = pd.DataFrame({'Author' : text}).value_counts().reset_index()
authors = authors.rename(columns = {'count' : 'Count'})
authors = authors[authors['Count'] > 7]
authors = authors[~authors['Author'].str.contains('Caffo')]
authors = authors['Author']

## Create a dataframe with just the info we need
authorDF = dat.copy()

## Create a column for every author included
for author in  authors:
    authorDF[author] = authorDF['Authors'].str.contains(author)

## Get rid of rows where no author is included
authorDF = authorDF[authorDF[authors].any(axis=1)]

## Melt the dataframe so that each row is a manuscript and each column is an author
authorDF = authorDF.melt(id_vars = ['Publication Year', 'Document Title', 'Journal Title', 'Authors', 'Citations'],
                         value_vars = authors,
                         var_name = 'Author',
                         value_name = 'Included')


authorDF['Count'] = 1

authorDF = authorDF[authorDF['Included'] == True]

authorDF['Last name'] = [name.split()[0] for name in authorDF['Author']]

authorDF.sort_values(by = ['Last name', 'Publication Year'], inplace = True)

fig = px.bar(authorDF, x = 'Last name',
                 y = 'Count',
                 color = 'Document Title',
                 hover_data = ['Publication Year', 'Document Title', 'Journal Title', 'Authors', 'Citations'])
fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)
fig.update_layout(xaxis={'categoryorder':'total descending'})


fig.show()


Here's a plot of number of authors for each manuscript by my position.


In [None]:
## Get a list of lists of last names
text = [s.split(';') for s in dat['Authors']]
lname= lambda namelist: [name.split()[0] for name in namelist]
text = [lname(x) for x in text]

authorno = [len(x) for x in text]
position = [x.index('Caffo') + 1 for x in text]

positionDF = pd.DataFrame({'# Authors' : authorno, 'Position' : position})


fig = px.scatter(positionDF.groupby(['# Authors', 'Position']).size().reset_index(name = 'Count'),
                 x = '# Authors',
                 y = 'Position',
                 color = 'Count',
                 size = 'Count')
fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()

Here's the total citation counts of manuscripts plotted by year of publication.


In [None]:
temp = dat
temp = temp.rename(columns = {'total' : 'Citations'})
fig = px.bar(temp,
             x = 'Publication Year',
             y = 'Citations',
             color = 'Document Title',
             hover_data = ['Publication Year', 'Citations', 'Document Title', 'Journal Title'])

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)

fig.show()

# Part II
## Teaching
### Advisees
To the nearest year from matriculation year. Includes advisees and co-advisees in formal degree programs or postdoctoral fellowships.


In [None]:
advisees = pd.read_csv("https://raw.githubusercontent.com/bcaffo/cv/master/Advisees.txt", sep = "|")

advisees['Start'] = pd.to_datetime("01/01/"+advisees['Start'].astype(str))
advisees['End'] = pd.to_datetime("12/31/"+advisees['End'].astype(str))
advisees = advisees.sort_values(by = ['Start', 'End'])


fig = px.timeline(advisees,
                  x_start="Start",
                  x_end="End",
                  y='Advisee',
                  color="Degree",
                  height=700,
                  hover_data = ['Title', 'Notes'])

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()

In [None]:
advisees

Unnamed: 0.1,Unnamed: 0,End,Start,Degree,Advisee,Place,Title,Notes,Unnamed: 8
0,,2005-12-31,2001-01-01,PhD,Leena Choi,JHU Biostat,*Modelling biomedical data and the foundation...,,
4,,2008-12-31,2003-01-01,PhD,Xianbin Li,JHU Biostat,*Modeling composite outcomes and their compon...,,
5,,2008-12-31,2003-01-01,PhD,Shu-Chih Su,JHU Biostat,*Structure/function relationships in the analy...,,
1,,2006-12-31,2004-01-01,ScM,Lijuan Deng,JHU Biostat,*Spline-based curve fitting with applications...,,
2,,2006-12-31,2004-01-01,MS,Bruce Swihart,University of Colorado Biostatistics,*Quantitative characterization of sleep archi...,co-advised with Naresh Punjabi and Gary Grunw...,
3,,2007-12-31,2005-01-01,MPH,Jeong Yun,JHU BSPH,*Incidence of hypertension in high risk group...,,
7,,2011-12-31,2006-01-01,PhD,Haley Hedlin,JHU Biostat,*Statistical methods for inter-subject analys...,,
8,,2011-12-31,2006-01-01,PhD,Bruce Swihart,JHU Biostat,*From individuals to populations: application...,,
9,,2012-12-31,2007-01-01,PhD,Jeff Goldsmith,JHU Biostat,*Cross-Sectional and longitudinal penalized f...,co-advised with primary advisor Ciprian Crain...,
6,,2010-12-31,2008-01-01,ScM,John Muschelli,JHU Biostat,*An iterative approach to hemodynamic respons...,,


### Student exam participation
Excludes alternate.


In [5]:
exams = pd.read_csv("https://raw.githubusercontent.com/bcaffo/cv/master/exams.csv")
exams['Exam'] = exams['Exam'].str.strip()
exams['Department'] = exams['Department'].str.strip()
exams = exams[ ['Year', 'Department', 'Exam'] ].value_counts().reset_index()
exams = exams.rename(columns = {'count' : 'Count'})

fig = px.bar(exams[ exams['Exam'] == "prelim"], x = "Year", y = "Count", color = "Department", title = "Prelim Exams / GBOs")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})


## Static image
#Image(fig.to_image(format="png", width=600, height=300, scale=2))
## Interactive
fig.show()

In [12]:
exams = pd.read_csv("https://raw.githubusercontent.com/bcaffo/cv/master/exams.csv")
exams.sort_values(by = ['Department', 'Exam', 'Year'])
exams

Unnamed: 0,Year,Name,Department,Exam
0,2024,Alana Chandler,BME,prelim
1,2024,Rafael Peixoto,BME,prelim
2,2024,Isabel Cachola,CS,prelim
3,2024,Marina Hernandez,Biostat,prelim
4,2024,Dongliang Zhang,Biostat,final
5,2024,Autumn Williams,BME,prelim
6,2023,Kalen Clifton,BME,prelim
7,2023,Victoria Bendersky,GTPCI,final
8,2023,Jianing Yao,Biostat,masters
9,2023,Carolyna Yamamoto,BME,prelim


In [17]:
# prompt: I want a list of people separated by year and exam with a list of names and their departments for each. So 2024 prelim name 1 (dept 1), name 2 (dept 2) and so on printed out as a dataframe

import pandas as pd

# Sample data (replace with your actual data)
df = exams.copy()

# Group by year and exam
grouped = df.groupby(['Year', 'Exam'])

# Create the desired output
result_data = []
result_data = []
for (year, exam), group in grouped:
    people_list = [f"{row['Name']} ({row['Department']})" for _, row in group.iterrows()]
    people_str = ", ".join(people_list)  # Join names with comma and space
    result_data.append({'Year': year, 'Exam': exam, 'Person': people_str})

result_df = pd.DataFrame(result_data)
result_df


Unnamed: 0,Year,Exam,Person
0,2002,prelim,"Dongmei Liu (Biostat), Samuel Mills (PFH)"
1,2003,prelim,"Yi Huang (Biostat), Lin Zhang (Epi)"
2,2004,final,"Samuel Mills (PFH), Judy Ng (HPM)"
3,2004,masters,Meh Fen Yeh (Biostat)
4,2004,prelim,"Kenneth Brenneman (EHS), Elizabeth Johnson (Bi..."
5,2005,final,"Leena Choi (Biostat), Mike Griswold (Biostat),..."
6,2005,masters,"Brendan Click (Biostat), Jennifer Ryea (Biostat)"
7,2005,prelim,"Leslie Cromwell (HPM), Bin He (EHS)"
8,2006,final,"Hongfei Guo (Biostat), Bin He (EHS)"
9,2006,masters,"Ricardo Carvalho (GTPCI), Bruce Swihart (UC De..."


In [None]:
fig = px.bar(exams[ exams['Exam'] == "final"], x = "Year", y = "Count", color = "Department", title = "Final Exams")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

## Static
#Image(fig.to_image(format="png", width=600, height=300, scale=2))
## Interactive
fig.show()

In [None]:
fig = px.bar(exams[ exams['Exam'] == "masters"],
             x = "Year", y = "Count", color = "Department", title = "Masters reader")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

## Static
#Image(fig.to_image(format="png", width=600, height=300, scale=2))
## Interactive
fig.show()

### Classroom Instruction
To the nearest year. Data Science and EDS specializations were with Roger Peng and Jeff Leek. Data Science Hackathon was with Leah Jager, Jeff Leek, Roger Peng. Guest lectures not included.


In [None]:
classes = pd.read_csv("https://raw.githubusercontent.com/bcaffo/cv/master/classes.txt", delimiter="|").drop(['Unnamed: 0', ' '], axis = 1)
classes['Start'] = pd.to_datetime("01/01/"+classes['Start'].astype(str))
classes['End'] = pd.to_datetime("12/31/"+classes['End'].astype(str))
classes = classes.sort_values(by = ['Start', 'End'])


fig = px.timeline(classes, x_start="Start", x_end="End", y="Course title", color="Notes",
                 hover_data = ['Course title', 'Place', 'Notes'],
                  height=1000)
fig.update_yaxes(autorange="reversed")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()

In [None]:
temp = classes.copy()
# Extract the year from the 'Start' and 'End' columns
temp['Start_Year'] = classes['Start'].dt.year
temp['End_Year'] = classes['End'].dt.year

temp.drop(['Start', 'End'], axis = 1)

Unnamed: 0,Role,Course title,Place,Notes,Start_Year,End_Year
0,Primaryinstructor,Advanced Statistical Computing,JHU,Regular course,2001,2005
2,Guest lecturer,Advanced Methods in Biostatistics II,JHU,Regular course,2003,2003
1,Primary instructor,Advanced Methods in Biostatistics IV,JHU,Regular course,2003,2004
4,Primary instructor,Statistical Computing,JHU,Regular course,2003,2004
3,Guest lecturer,Computing orientation and student comp club,JHU,Seminar course,2003,2008
6,Primary instructor,Advanced Methods in Biostatistics III,JHU,Regular course,2004,2005
7,Primary instructor,Methods in Biostatistics I,JHU,Regular course,2005,2010
8,Primary instructor,Methods in Biostatistics II,JHU,Regular course,2005,2010
5,Primary instructor,Statistical Computing,JHU,Regular course,2008,2008
9,Primary instructor,Medical Imaging Statistics,JHU,Regular course,2008,2008


##### E-books
E-books are free and open access, excepting *Methods in Biostatistics with R*. For all books, student get all subsequent version updates.

+ *Statistical Inference*, Leanpub
+ *Regression Models*, Leanpub
+ *Developing Data Products*, Leanpub
+ *Advanced Linear Models for Data Science*, Leabpub,
+ *Methods in Biostatistics with R*, Leanpub, with John Muschelli, Ciprian Crainiceanu
+ *Executive Data Science*, Leanpub, with Roger Peng, Jeff Leek

##### Other
+ PI (roll of executive producer, non-instructor) for the BD2K R25 Genomic Data Science Specialization, fMRI 1 and 2 (Lindquist / Wager), Neurohacking in R (Craininceanu, Sweeney, Muschelli), Neuroscience for Neuroimaging (Baker)
+ swirl: Mentored project by Nick Carchedi intiated during his internship
+ Course notes for Biostatistics 140.651-2 listed on the Johns Hopkins Open Courseware project
+ YouTube channel (all educational content) - 14k subscribers, over 400 videos, 6.4k views in past 28 days, ~300 hours of total watch time in the last 28 days

### Research grants


In [None]:
pigrants = pd.read_csv("https://raw.githubusercontent.com/bcaffo/cv/master/grants.txt", delimiter="|")
pigrants = pigrants.drop([' '], axis = 1)
pigrants = pigrants.assign(Number = np.arange(pigrants.shape[0]))
pigrants['Start'] = pd.to_datetime(pigrants['Start'])
pigrants['Finish'] = pd.to_datetime(pigrants['Finish'])

fig = px.timeline(pigrants, x_start="Start", x_end="Finish", y="Number", color="Organization",
                 hover_data = ['Role', 'Start', 'Finish', 'Organization', 'Mechanism', 'Title'])
fig.update_yaxes(autorange="reversed")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()

In [None]:
pigrants

Unnamed: 0,Role,Start,Finish,Organization,Mechanism,Title,Number
0,PI,2002-01-01,2004-01-01,JHBSPH,Faculty Innovation Award,Monte Carlo and Markov chain Monte Carlo Algor...,0
1,PI,2006-05-01,2009-04-30,NIH NIBIB,K25 EB003491,A mentored training program in imaging science,1
2,PI,2010-09-30,2014-08-31,NIH NIBIB,R01 EB012547,Statistical methods for hierarchical large n l...,2
3,Sub PI,2011-09-01,2016-08-31,NIH NIBIB,P41 EB015909,"Resource for quantitative functional MRI, TRD ...",3
4,PI,2012-03-14,2014-03-14,Amazon AWS,Cloud research grant,Cloud-Based Development of Neuroimaging Software,4
5,CoPI,2012-08-27,2014-08-26,JHU BSI,,The Center for Quantitative Neuroscience: a co...,5
6,PI,2014-12-01,2017-11-30,NIH,R25 EB020378,"Big Data Education for the Masses: MOOCs, Modu...",6
7,PI,2019-08-16,2020-08-16,NVIDIA,Hardware grant,GPU Accelerated Statistical Inference,7
8,PI,2021-05-07,2025-03-21,NIH NIBIB,R01 EB029977,Statistical methods for structural and functio...,8
9,Sub PI,2021-07-01,2026-04-30,NIH NIBIB,P41 EB031771,"MRI Resource for Physiologic, Metabolic and An...",9


### Co-investigator and subcontract awards
This is surprisingly hard and likely incomplete. Here's the best I could do for title and mechanism.


In [None]:
grants = pd.read_csv("https://raw.githubusercontent.com/bcaffo/cv/master/grantsFull.csv")
grants.head()
grants = grants.assign(Number = np.arange(grants.shape[0]))
grants['Start'] = pd.to_datetime(grants['Start'])
grants['End'] = pd.to_datetime(grants['End'])
grants.loc[grants['Mechanism'].isna(), 'Mechanism'] = 'Other'
grants.loc[grants['PI'].isna(), 'PI'] = 'Other'
grants['Title small'] = [i[0 : 70] for i in grants['Title']]
grants.loc[grants['YearlyDC'].isna(), 'YearlyDC'] = "No info"

fig = px.timeline(grants, x_start="Start", x_end="End", y="Title small", color = "Mechanism",
                  height=1000)
fig.update_yaxes(autorange="reversed")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})
fig.show()


Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.


Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.



In [None]:
# prompt: I want to print out the df grants but change the Start and End columns to be only year

# Extract the year from the 'Start' and 'End' columns
grants['Start_Year'] = grants['Start'].dt.year
grants['End_Year'] = grants['End'].dt.year

# Display the grants DataFrame with modified Start and End columns
grants[['Start_Year', 'End_Year', 'Title', 'PI', 'Organization', 'Mechanism', 'YearlyDC']]


Unnamed: 0,Start_Year,End_Year,Title,PI,Organization,Mechanism,YearlyDC
0,2004,2008,Coordinating Center For Sleep Heart Health,Samet,NIH,U01,"$304,820"
1,2003,2008,Statistical Methods For Environmental Epidemio...,Dominici,NIH,R01,"$250,000"
2,2000,2006,Brain Imaging and Cognition in Subjects at Ris...,Bassett,NIH,R01,"$497,475"
3,2002,2007,Corrective Image Reconstruction Methods for ECT,Tsui,NIH,R02,"$351,619"
4,2005,2007,Imaging Serotoneric Transmission in HIV Depres...,Pomper,NIH,R21,"$142,500"
5,2006,2011,Quantitative SPECT for Targeted Radionuclide T...,Frey,NIH,R01,"$296,418"
6,2006,2009,A Mentored Training Program in Quantitative Me...,Caffo,NIH,K25,"$372,906"
7,2005,2008,"Aging, Lead Exposure, and Neurobehavioral Decline",Schwartz,NIH,R01,"$183,795"
8,2007,2011,Longitudinal Changes In Sleep Structure: Impli...,Punjabi,NIH,R01,"$225,000"
9,2001,2006,Disability in Parkinson’s Disease,Bassett,NIH,M01,No info


Here's my most frequent grant PIs.


In [None]:
grants['Count'] = 1

fig = px.bar(grants,
             x = 'PI',
             y = 'Count',
             color = 'Title',
             hover_data = ['Start', 'End', 'Title', 'PI'])

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)

fig.show()

Here's a breakdown of grant mechanisms.


In [None]:
#mechanism = grants['Mechanism'].value_counts().reset_index()
#labels = mechanism['index']
#values = mechanism['Mechanism']
#fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.6)])

#fig.update_traces(textinfo='value')

fig = px.bar(grants,
             x = 'Mechanism',
             y = 'Count',
             color = 'Title',
             hover_data = ['Mechanism', 'Start', 'End', 'Title', 'PI'])



fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)

fig.show()

Here's grants by the log base 10 of the yearly direct costs and start time. Note some grants only show subcontract value where as others show the parent grant.


In [None]:
YDC = []
for x in grants['YearlyDC']:
    if x != "No info":
        x = np.log10(float(x.replace("$", "").replace(",", "")))
        YDC.append(x)
    else :
        YDC.append(-1)
grants['Log10 YDC'] = YDC

fig = px.scatter(grants[(grants['Log10 YDC'] > 0)],
                 y = 'Log10 YDC',
                 x = 'Start',
                 color = 'Mechanism',
                 size = 'Log10 YDC')
fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()

### Academic service
Here's my major service roles by year rounded to the nearest year by the major organizational group that it represents. Also, I do the normal other service (promotion comittees etcetera).


In [None]:
service = pd.read_csv("https://raw.githubusercontent.com/bcaffo/cv/master/service.txt", delimiter="|").drop(['Unnamed: 0', 'Unnamed: 5'], axis = 1)
service['Start'] = pd.to_datetime("01/01/"+service['Start'].astype(str))
service['End'] = pd.to_datetime("12/31/"+service['End'].astype(str))
service = service.sort_values(by = ['Start', 'End'])


fig = px.timeline(service, x_start="Start", x_end="End", y="Role", color="Group",
                 hover_data = ['Role', 'Group'],
                 height = 600)
fig.update_yaxes(autorange="reversed")

fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.show()

In [None]:
# prompt: I want to display the service df with only dates being years

# Extract the year from the 'Start' and 'End' columns
service['Start_Year'] = service['Start'].dt.year
service['End_Year'] = service['End'].dt.year

# Display the service DataFrame with only the year
service[['Start_Year', 'End_Year', 'Role', 'Group']]


Unnamed: 0,Start_Year,End_Year,Role,Group
19,2001,2002,Seminar coordinator,Biostatistics
18,2001,2009,Information technology committee member,Biostatistics
14,2002,2002,Admissions committee,Biostatistics
23,2002,2004,Faculty Senate representative,JHBSPH
21,2003,2003,Co-organizer Junior Faculty Meetings,JHBSPH
22,2007,2010,CEDC member,JHBSPH
20,2008,2010,Co-director Biostat/Epi MPH concentration,MPH
24,2009,2011,Admissions Committee member,MPH
25,2009,2011,Executive Board member,MPH
15,2009,2020,Admissions committee,Biostatistics


### Seminars
Here's a plot of the invited seminars I've logged. The list with presentation files can be found
[here](https://docs.google.com/spreadsheets/d/1mRC6xxZmNj3DnwwvCh_8GpErwhvJNq9gkRB3mQz1JIg/edit?usp=sharing).


In [18]:
seminars = pd.read_csv("https://docs.google.com/spreadsheets/d/1mRC6xxZmNj3DnwwvCh_8GpErwhvJNq9gkRB3mQz1JIg/export?format=csv&gid=0")
seminars = seminars.assign(Count = 1)
#seminarYear = seminars['Year'].value_counts().reset_index()
#seminarYear = seminarYear.rename(columns = {"index" : "Year", "Year" : "Count"}).sort_values("Year", ascending =False)
#fig = px.bar(seminarYear, x = "Year", y = "Count")
fig = px.bar(seminars, x = "Year", y = "Count", color = "Talk",
             hover_data = ['Year', 'Talk', 'Where'])
fig = fig.update_layout({
    'plot_bgcolor' : 'rgba(0, 0, 0, 0)',
    'paper_bgcolor' : 'rgba(0, 0, 0, 0)'})

fig.update_layout(showlegend=False)

## Static
#Image(fig.to_image(format="png", width=400, height=200, scale=2))
## Interactive
fig.show()

In [19]:
seminars

Unnamed: 0,Year,Talk,Where,State,Country,Count
0,2001,ESUP accept/reject sampling,NC State Statistics,NC,USA,1
1,2001,Monte Carlo exact conditional hypothesis tests...,"AT&T Labs, Florham Park, New Jersey",NJ,USA,1
2,2001,Monte Carlo exact conditional hypothesis tests...,Fifth Workshop on Groebner Bases and Statistic...,LA,USA,1
3,2001,Monte Carlo exact conditional hypothesis tests...,Johns Hopkins University Department of Biostat...,MD,USA,1
4,2001,Monte Carlo exact conditional hypothesis tests...,University of Michigan Department of Statisti...,MI,USA,1
5,2001,Monte Carlo exact conditional hypothesis tests...,Ohio State University Department of Statistics...,OH,USA,1
6,2002,Model selection and fitting for empirical Baye...,JSM New York,NY,USA,1
7,2002,Ascent-based MCEM,"Yale University Division of Biostatistics, New...",CT,USA,1
8,2002,ESUP accept/reject sampling,Johns Hopkins University Department of Biostat...,MD,USA,1
9,2003,A tour of biostatistics,"Drexel University Department of Mathematics, P...",PA,USA,1
