# Trial Publication Rates

This analysis takes a look at how frequently clinical trials produce a publication according to the data available from ClinicalTrials.gov. I first look at all studies and then sub-groups of completed studies and studies which terminated before completion.

## Limitations

While authors are supposed to be updating clinicaltrials.gov with publcation data it's not clear how well this is adhered to. There are likely studies which did result in a publication that was not reported to clinicaltrials.gov.

In [None]:
require 'dbi'
require 'daru' 
require 'yaml'

# change this to point to your aact_analysis directory
database = YAML.load_file("/home/dan/workspace/aact_analysis/secrets.yml")['database']; nil
require 'nyaplot'

In [None]:
db = DBI.connect("dbi:Mysql:#{database['name']}:#{database['host']}", database['user'], database['password']); nil

## Sample
Here I'm taking a sample of studies which started after January 1st 2006 and completed by January 1 2014. The minimum year of 2006 is to ensure that this data represents somewhat recent studies. I chose a maximum completion year of 2014 to allow studies at least two years to produce a publication.

In total this population represents 57,865 studies.

In [None]:
studies = Daru::DataFrame.from_sql(
  db, 
  "
    select
      cs.nct_id,
      overall_status,
      completion_date,
      start_date,
      completion_date,
      completion_date_type,
      firstreceived_results_date,
      coalesce(reference_count, 0) as reference_count
    from clinical_study cs
    left outer join (
      select nct_id, count(1) as reference_count
      from `references`
      where 
        reference_type = 'Results Reference'
      group by nct_id
    ) ref
    on ref.nct_id = cs.nct_id
    where 
      start_date is not null and completion_date is not null
      and start_date > '2008-09-01' 
      and completion_date < '2014-01-01' 
      and completion_date_type = 'Actual'
      and study_type != 'Expanded Access'
      and overall_status = 'Completed'"
)

# label studies as having published results or not
studies[:publication_status] = studies.map(:row) do |r| 
  if(r[:reference_count] > 0 && !r[:firstreceived_results_date].nil?)
    'Both'
  elsif(r[:reference_count] > 0 && r[:firstreceived_results_date].nil?)
    'Publication'
  elsif(r[:reference_count] == 0 && !r[:firstreceived_results_date].nil?)
    'Results on CT.gov'
  else
    'No Posted Results'
  end
end

studies.size

## What percentage of these studies are published?


In [None]:
published_counts = studies.group_by(:publication_status).count[1..1]
published_counts.vectors = Daru::Index.new([:study_count])
published_counts[:status] = published_counts.index.to_a

published_counts[:percent_published] = published_counts.map(:row) do |r| 
  ((r[:study_count].to_f / published_counts[:study_count].sum.to_f) * 100).round(4)
end

published_counts = published_counts.sort([:study_count], ascending: false)

published_counts.plot type: :bar, y: :study_count, x: :status do |plot, diagram|
  plot.x_label 'Dissemination Status'
  plot.y_label '# of studies'
  plot.margin({top: 30, bottom: 100, left: 100, right: 30})
  diagram.color ['#84C76D']
end

In [None]:
published_counts

## What penrcentage of studies who report having completed have published results within 2 years?

The above percent of studies is dramatic but I'd like to consider that some of these studies have reported that their study terminated early. While these aren't successful it would be interesting to know how many studies which appear have to completed as planned but never shared results.

In [None]:
completed_studies = studies.filter_rows { |r| r[:overall_status] == 'Completed' };
completed_studies.size

In [None]:
published_counts = completed_studies.group_by(:publication_status).count
published_counts[:status] = published_counts.index.to_a

published_counts[:percent] = published_counts.map(:row) { |r| (r[:nct_id].to_f / published_counts[:nct_id].sum.to_f).round(4) }

published_counts
# published_counts.plot type: :bar, y: :percent, x: :status do |plot, diagram|
#   plot.x_label 'Publication Status'
#   plot.y_label '% of studies'
# end

Publication rates among completed studies seem to be similar to that of the general study sample. This begs the question of whether there are studies which did not complete as planned which still published results.

## What percentage of studies which terminated early published results?

In [None]:
terminated_studies = studies.filter_rows { |r| r[:overall_status] != 'Completed' };
terminated_studies.size

In [None]:
published_counts = terminated_studies.group_by(:publication_status).count
published_counts[:status] = published_counts.index.to_a

published_counts[:percent] = published_counts.map(:row) { |r| (r[:nct_id].to_f / published_counts[:nct_id].sum.to_f).round(4) }

published_counts