# Lecture 24 2018-11-08: Matplotlib

Joining and merging dataframes; clustering analysis by column values.

all about indexing and using indexes.

In [2]:
import pandas as pd
from pandas import Series, DataFrame   # use these so often, this can be helpful

In [3]:
from matplotlib import pyplot as plt
%matplotlib inline

In [4]:
import numpy as np
from numpy.random import randn

##  Pandas Merge/Concatenate ##

Pandas can combine multiple dataframes using the values in a column (or columns) as a guide. This is called a *merge*, and there are various ways to merge depending on how you want to treat duplicate or missing values. 

Pandas can also combine dataframes by glueing them together, known as *concatenate*, either in the columns (axis=0) or rows (axis=1) directions. 

[Pandas manual on merge/join/concatnate](http://pandas.pydata.org/pandas-docs/stable/merging.html)

For these examples, we use a much shortened version of the microbiome data we've been working with. I'm adding a duplicate ID column, for one of the examples.

### Get data for examples

These are two microbiome datasets. One is the relative abundance data we've used before. 

The other is "metadata" about the individual samples. These metadata contain information about the sample such as when it was taken, from whom it was taken, the Nugent score (a diagnostic of *bacterial vaginosis*), and whether the patient was on medications when it was taken.

These are both very much shortened and simplified versions of the real world datasets, which we will explore in the homework.

In [11]:
counts = pd.read_table( 'vaginal_communities_short.txt' )
metadata = pd.read_table( 'vaginal_metadata_short.txt' )
metadata['newID'] = metadata['ID']

In [12]:
counts

Unnamed: 0,ID,time,sampleID,patientID,Week,Day,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,Lactobacillus_gasseri,Lactobacillus_jensenii,Atopobium_vaginae,Megasphaera_sp._type_1,Streptococcus_anginosus,Prevotella_genogroup_3,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total
0,w1.1,1,AYAC01W1D1.113408.091509,AYAC01,1,1,113408,91509,4478,12,6,5,0,0,0,0,0,0,130,4719
1,w2.1,1,AYAC02W1D1.113568.091509,AYAC02,1,1,113568,91509,6147,422,14,225,1,0,0,0,0,0,0,6916
2,w1.67,67,AYAC01W10D4.199214.112009,AYAC01,10,4,199214,112009,1,227,0,0,100,1335,0,9,13,1,24,3950
3,w2.67,67,AYAC02W10D4.199246.112009,AYAC02,10,4,199246,112009,3197,1348,12,194,1,0,0,0,0,0,3,4848


In [13]:
metadata

Unnamed: 0,ID,DIA_DAY,DIA_WEEK,Woman,Score,MEDS,newID
0,w1.1,1,1,AYAC01,1,0,w1.1
1,w1.3,3,1,AYAC01,2,0,w1.3
2,w2.1,1,1,AYAC02,2,1,w2.1
3,w2.64,1,10,AYAC02,0,1,w2.64
4,w2.65,2,10,AYAC02,0,1,w2.65


### Merge ###
To merge two dataframes on a specific column c, use *merge(on=c)*. For merging on multiple columns, use a list of names: *merge(on=[c1, ..., cn])*. 

If not specified, merge is on all columns in the two dataframes that have the same name. To merge on columns with different names, say *l* and *r* as the column names for the left and right DataFrames, use *merge(left_on=l, right_on=r)*

By default, merge only keeps the rows which have the same values in both tables, throwing away rows that appear in only one DataFrame.

In [14]:
pd.merge( counts, metadata, on='ID' )

Unnamed: 0,ID,time,sampleID,patientID,Week,Day,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,...,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total,DIA_DAY,DIA_WEEK,Woman,Score,MEDS,newID
0,w1.1,1,AYAC01W1D1.113408.091509,AYAC01,1,1,113408,91509,4478,12,...,0,0,130,4719,1,1,AYAC01,1,0,w1.1
1,w2.1,1,AYAC02W1D1.113568.091509,AYAC02,1,1,113568,91509,6147,422,...,0,0,0,6916,1,1,AYAC02,2,1,w2.1


In [15]:
# notice new columns "ID_x" and "ID_y" for the left and matched right values
pd.merge(counts, metadata, left_on='ID', right_on='newID')

Unnamed: 0,ID_x,time,sampleID,patientID,Week,Day,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,...,Enterococcus_faecalis,Corynebacterium_accolens,Total,ID_y,DIA_DAY,DIA_WEEK,Woman,Score,MEDS,newID
0,w1.1,1,AYAC01W1D1.113408.091509,AYAC01,1,1,113408,91509,4478,12,...,0,130,4719,w1.1,1,1,AYAC01,1,0,w1.1
1,w2.1,1,AYAC02W1D1.113568.091509,AYAC02,1,1,113568,91509,6147,422,...,0,0,6916,w2.1,1,1,AYAC02,2,1,w2.1


#### Inner Versus Outer ####
*pd.merge(how=)* specifies how to do the merge, for example saying whether to use all values from the left, right, or both columns, or only common values.

how   | Meaning
:---- | :--------
inner | (default) merge on common values only
outer | merge on all possible values
left  | use all the values from the left DataFrame, inserting *NaN* if necessary
right | use all the values from the right DataFrame, inserting *NaN* if necessary

In [16]:
# restore values, after we've messed them up, above
counts = pd.read_table( 'vaginal_communities_short.txt' )
metadata = pd.read_table( 'vaginal_metadata_short.txt' )
metadata['newID'] = metadata['ID']

In [17]:
counts

Unnamed: 0,ID,time,sampleID,patientID,Week,Day,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,Lactobacillus_gasseri,Lactobacillus_jensenii,Atopobium_vaginae,Megasphaera_sp._type_1,Streptococcus_anginosus,Prevotella_genogroup_3,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total
0,w1.1,1,AYAC01W1D1.113408.091509,AYAC01,1,1,113408,91509,4478,12,6,5,0,0,0,0,0,0,130,4719
1,w2.1,1,AYAC02W1D1.113568.091509,AYAC02,1,1,113568,91509,6147,422,14,225,1,0,0,0,0,0,0,6916
2,w1.67,67,AYAC01W10D4.199214.112009,AYAC01,10,4,199214,112009,1,227,0,0,100,1335,0,9,13,1,24,3950
3,w2.67,67,AYAC02W10D4.199246.112009,AYAC02,10,4,199246,112009,3197,1348,12,194,1,0,0,0,0,0,3,4848


In [37]:
metadata

Unnamed: 0,ID,DIA_DAY,DIA_WEEK,Woman,Score,MEDS
0,w1.1,1,1,AYAC01,1,0
1,w1.3,3,1,AYAC01,2,0
2,w2.1,1,1,AYAC02,2,1
3,w2.64,1,10,AYAC02,0,1
4,w2.65,2,10,AYAC02,0,1


In [18]:
pd.merge( counts, metadata, on='ID', how='inner' )

Unnamed: 0,ID,time,sampleID,patientID,Week,Day,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,...,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total,DIA_DAY,DIA_WEEK,Woman,Score,MEDS,newID
0,w1.1,1,AYAC01W1D1.113408.091509,AYAC01,1,1,113408,91509,4478,12,...,0,0,130,4719,1,1,AYAC01,1,0,w1.1
1,w2.1,1,AYAC02W1D1.113568.091509,AYAC02,1,1,113568,91509,6147,422,...,0,0,0,6916,1,1,AYAC02,2,1,w2.1


In [19]:
pd.merge( counts, metadata, on='ID', how='outer' )

Unnamed: 0,ID,time,sampleID,patientID,Week,Day,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,...,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total,DIA_DAY,DIA_WEEK,Woman,Score,MEDS,newID
0,w1.1,1.0,AYAC01W1D1.113408.091509,AYAC01,1.0,1.0,113408.0,91509.0,4478.0,12.0,...,0.0,0.0,130.0,4719.0,1.0,1.0,AYAC01,1.0,0.0,w1.1
1,w2.1,1.0,AYAC02W1D1.113568.091509,AYAC02,1.0,1.0,113568.0,91509.0,6147.0,422.0,...,0.0,0.0,0.0,6916.0,1.0,1.0,AYAC02,2.0,1.0,w2.1
2,w1.67,67.0,AYAC01W10D4.199214.112009,AYAC01,10.0,4.0,199214.0,112009.0,1.0,227.0,...,13.0,1.0,24.0,3950.0,,,,,,
3,w2.67,67.0,AYAC02W10D4.199246.112009,AYAC02,10.0,4.0,199246.0,112009.0,3197.0,1348.0,...,0.0,0.0,3.0,4848.0,,,,,,
4,w1.3,,,,,,,,,,...,,,,,3.0,1.0,AYAC01,2.0,0.0,w1.3
5,w2.64,,,,,,,,,,...,,,,,1.0,10.0,AYAC02,0.0,1.0,w2.64
6,w2.65,,,,,,,,,,...,,,,,2.0,10.0,AYAC02,0.0,1.0,w2.65


In [20]:
pd.merge( counts, metadata, on='ID', how='left' )

Unnamed: 0,ID,time,sampleID,patientID,Week,Day,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,...,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total,DIA_DAY,DIA_WEEK,Woman,Score,MEDS,newID
0,w1.1,1,AYAC01W1D1.113408.091509,AYAC01,1,1,113408,91509,4478,12,...,0,0,130,4719,1.0,1.0,AYAC01,1.0,0.0,w1.1
1,w2.1,1,AYAC02W1D1.113568.091509,AYAC02,1,1,113568,91509,6147,422,...,0,0,0,6916,1.0,1.0,AYAC02,2.0,1.0,w2.1
2,w1.67,67,AYAC01W10D4.199214.112009,AYAC01,10,4,199214,112009,1,227,...,13,1,24,3950,,,,,,
3,w2.67,67,AYAC02W10D4.199246.112009,AYAC02,10,4,199246,112009,3197,1348,...,0,0,3,4848,,,,,,


In [21]:
pd.merge( counts, metadata, on='ID', how='right' )

Unnamed: 0,ID,time,sampleID,patientID,Week,Day,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,...,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total,DIA_DAY,DIA_WEEK,Woman,Score,MEDS,newID
0,w1.1,1.0,AYAC01W1D1.113408.091509,AYAC01,1.0,1.0,113408.0,91509.0,4478.0,12.0,...,0.0,0.0,130.0,4719.0,1,1,AYAC01,1,0,w1.1
1,w2.1,1.0,AYAC02W1D1.113568.091509,AYAC02,1.0,1.0,113568.0,91509.0,6147.0,422.0,...,0.0,0.0,0.0,6916.0,1,1,AYAC02,2,1,w2.1
2,w1.3,,,,,,,,,,...,,,,,3,1,AYAC01,2,0,w1.3
3,w2.64,,,,,,,,,,...,,,,,1,10,AYAC02,0,1,w2.64
4,w2.65,,,,,,,,,,...,,,,,2,10,AYAC02,0,1,w2.65


#### Merge on Indexes ####
merge on indexes, rather than columns. This works for multi-indexes, too.

Use *merge(left_index=True)* or *merge(right_index=True)* to say whether to merge on index values (rather than column values).

one can mix merging by columns or indexes by mixing the *_index* and *_on* parameters. 

In [22]:
counts = pd.read_table( 'vaginal_communities_short.txt' )
metadata = pd.read_table( 'vaginal_metadata_short.txt' )
metadata['newID'] = metadata['ID']

In [23]:
# index the datasets
counts.set_index('ID', inplace=True)
metadata.set_index('ID', inplace=True)

#counts
#metadata

In [24]:
pd.merge( counts, metadata, left_index=True, right_index=True )

Unnamed: 0_level_0,time,sampleID,patientID,Week,Day,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,Lactobacillus_gasseri,...,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total,DIA_DAY,DIA_WEEK,Woman,Score,MEDS,newID
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
w1.1,1,AYAC01W1D1.113408.091509,AYAC01,1,1,113408,91509,4478,12,6,...,0,0,130,4719,1,1,AYAC01,1,0,w1.1
w2.1,1,AYAC02W1D1.113568.091509,AYAC02,1,1,113568,91509,6147,422,14,...,0,0,0,6916,1,1,AYAC02,2,1,w2.1


In [25]:
pd.merge(counts, metadata, left_index=True, right_on='newID')

Unnamed: 0_level_0,time,sampleID,patientID,Week,Day,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,Lactobacillus_gasseri,...,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total,DIA_DAY,DIA_WEEK,Woman,Score,MEDS,newID
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
w1.1,1,AYAC01W1D1.113408.091509,AYAC01,1,1,113408,91509,4478,12,6,...,0,0,130,4719,1,1,AYAC01,1,0,w1.1
w2.1,1,AYAC02W1D1.113568.091509,AYAC02,1,1,113568,91509,6147,422,14,...,0,0,0,6916,1,1,AYAC02,2,1,w2.1


Of course, other types of merge still work

In [26]:
pd.merge(counts, metadata, 
         left_index=True, 
         right_on='newID', 
         how='outer' 
        )

Unnamed: 0_level_0,time,sampleID,patientID,Week,Day,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,Lactobacillus_gasseri,...,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total,DIA_DAY,DIA_WEEK,Woman,Score,MEDS,newID
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
w1.1,1.0,AYAC01W1D1.113408.091509,AYAC01,1.0,1.0,113408.0,91509.0,4478.0,12.0,6.0,...,0.0,0.0,130.0,4719.0,1.0,1.0,AYAC01,1.0,0.0,w1.1
w2.1,1.0,AYAC02W1D1.113568.091509,AYAC02,1.0,1.0,113568.0,91509.0,6147.0,422.0,14.0,...,0.0,0.0,0.0,6916.0,1.0,1.0,AYAC02,2.0,1.0,w2.1
w2.65,67.0,AYAC01W10D4.199214.112009,AYAC01,10.0,4.0,199214.0,112009.0,1.0,227.0,0.0,...,13.0,1.0,24.0,3950.0,,,,,,w1.67
w2.65,67.0,AYAC02W10D4.199246.112009,AYAC02,10.0,4.0,199246.0,112009.0,3197.0,1348.0,12.0,...,0.0,0.0,3.0,4848.0,,,,,,w2.67
w1.3,,,,,,,,,,,...,,,,,3.0,1.0,AYAC01,2.0,0.0,w1.3
w2.64,,,,,,,,,,,...,,,,,1.0,10.0,AYAC02,0.0,1.0,w2.64
w2.65,,,,,,,,,,,...,,,,,2.0,10.0,AYAC02,0.0,1.0,w2.65


### Merging on multi level indexes

In [58]:
metadata = pd.read_table( 'vaginal_metadata_short.txt' )
metadata['newID'] = metadata['ID']
metadata.set_index(['Woman', 'DIA_WEEK', 'DIA_DAY'], inplace=True)
metadata.index.names = [ 'Woman', 'Week', 'Day' ]
metadata

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,ID,Score,MEDS,newID
Woman,Week,Day,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AYAC01,1,1,w1.1,1,0,w1.1
AYAC01,1,3,w1.3,2,0,w1.3
AYAC02,1,1,w2.1,2,1,w2.1
AYAC02,10,1,w2.64,0,1,w2.64
AYAC02,10,2,w2.65,0,1,w2.65


In [59]:
counts = pd.read_table( 'vaginal_communities_short.txt' )
counts.set_index(['patientID', 'Week', 'Day'], inplace=True)
counts.index.names = [ 'Woman', 'Week', 'Day' ]   # <--- understand this!
counts

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,ID,time,sampleID,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,Lactobacillus_gasseri,Lactobacillus_jensenii,Atopobium_vaginae,Megasphaera_sp._type_1,Streptococcus_anginosus,Prevotella_genogroup_3,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total
Woman,Week,Day,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
AYAC01,1,1,w1.1,1,AYAC01W1D1.113408.091509,113408,91509,4478,12,6,5,0,0,0,0,0,0,130,4719
AYAC02,1,1,w2.1,1,AYAC02W1D1.113568.091509,113568,91509,6147,422,14,225,1,0,0,0,0,0,0,6916
AYAC01,10,4,w1.67,67,AYAC01W10D4.199214.112009,199214,112009,1,227,0,0,100,1335,0,9,13,1,24,3950
AYAC02,10,4,w2.67,67,AYAC02W10D4.199246.112009,199246,112009,3197,1348,12,194,1,0,0,0,0,0,3,4848


In [60]:
pd.merge(counts, metadata) # will use index levels

Unnamed: 0,ID,time,sampleID,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,Lactobacillus_gasseri,Lactobacillus_jensenii,Atopobium_vaginae,Megasphaera_sp._type_1,Streptococcus_anginosus,Prevotella_genogroup_3,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total,Score,MEDS,newID
0,w1.1,1,AYAC01W1D1.113408.091509,113408,91509,4478,12,6,5,0,0,0,0,0,0,130,4719,1,0,w1.1
1,w2.1,1,AYAC02W1D1.113568.091509,113568,91509,6147,422,14,225,1,0,0,0,0,0,0,6916,2,1,w2.1


####  hierarchical indexes

I generally find it easier to flatten the indexes before merging, rather than trying to remember all the merge parameters, trying to get more than on index in play.

In [61]:
pd.merge(counts.reset_index(), metadata.reset_index(),
         on=['Week', 'Day'],
         #how='outer'
        )

Unnamed: 0,Woman_x,Week,Day,ID_x,time,sampleID,Batch_ID,Date,Lactobacillus_crispatus,Lactobacillus_iners,...,Prevotella_genogroup_3,Clostridiales,Enterococcus_faecalis,Corynebacterium_accolens,Total,Woman_y,ID_y,Score,MEDS,newID
0,AYAC01,1,1,w1.1,1,AYAC01W1D1.113408.091509,113408,91509,4478,12,...,0,0,0,130,4719,AYAC01,w1.1,1,0,w1.1
1,AYAC01,1,1,w1.1,1,AYAC01W1D1.113408.091509,113408,91509,4478,12,...,0,0,0,130,4719,AYAC02,w2.1,2,1,w2.1
2,AYAC02,1,1,w2.1,1,AYAC02W1D1.113568.091509,113568,91509,6147,422,...,0,0,0,0,6916,AYAC01,w1.1,1,0,w1.1
3,AYAC02,1,1,w2.1,1,AYAC02W1D1.113568.091509,113568,91509,6147,422,...,0,0,0,0,6916,AYAC02,w2.1,2,1,w2.1


### Concatenate ###
Get more metadata from another file, 

To merge DataFrames d1, ... dn, 'vertically', gluing them together by the columns,use the *concat([d1, ... dn], axis=1)* method. 

The default is to paste dataframes together horizontally, which is (axis=0).
Concatenating on axis 0 (vertically) is often useful when working with sequence data, since you get different datasets for each lane of the sequencing machine.

*concat* adds *NaN* values when necessary.

In this example, the two files are both metadata, but one had just medications and the other has sleep data.

In [62]:
metadata_2 = pd.read_table( 'vaginal_metadata_short_2.txt' )
metadata_2

Unnamed: 0,ID,DIA_DAY,DIA_WEEK,Woman,MEDS_SP,SLEEP
0,w1.1,1,1,AYAC01,,7
1,w1.2,1,2,AYAC01,"MOTRIN, NYQUIL, THERAFLU",7
2,w1.3,3,1,AYAC01,,8
3,w2.1,1,1,AYAC02,SYNTHROID,6
4,w2.64,1,10,AYAC02,SYNTHROID,7
5,w2.65,2,10,AYAC02,SYNTHROID,6


In [63]:
metadata = pd.read_table( 'vaginal_metadata_short.txt' )
metadata

Unnamed: 0,ID,DIA_DAY,DIA_WEEK,Woman,Score,MEDS
0,w1.1,1,1,AYAC01,1,0
1,w1.3,3,1,AYAC01,2,0
2,w2.1,1,1,AYAC02,2,1
3,w2.64,1,10,AYAC02,0,1
4,w2.65,2,10,AYAC02,0,1


In [64]:
pd.concat([metadata_2, metadata], axis=1)

Unnamed: 0,ID,DIA_DAY,DIA_WEEK,Woman,MEDS_SP,SLEEP,ID.1,DIA_DAY.1,DIA_WEEK.1,Woman.1,Score,MEDS
0,w1.1,1,1,AYAC01,,7,w1.1,1.0,1.0,AYAC01,1.0,0.0
1,w1.2,1,2,AYAC01,"MOTRIN, NYQUIL, THERAFLU",7,w1.3,3.0,1.0,AYAC01,2.0,0.0
2,w1.3,3,1,AYAC01,,8,w2.1,1.0,1.0,AYAC02,2.0,1.0
3,w2.1,1,1,AYAC02,SYNTHROID,6,w2.64,1.0,10.0,AYAC02,0.0,1.0
4,w2.64,1,10,AYAC02,SYNTHROID,7,w2.65,2.0,10.0,AYAC02,0.0,1.0
5,w2.65,2,10,AYAC02,SYNTHROID,6,,,,,,


In [65]:
md3 = DataFrame([['w1.4', 1, 4, 'AYAC01', 3, 0],
                 ['w3.1', 1, 1, 'AYAC99', 1, 0],
                 ['w3.4', 1, 4, 'AYAC99', 5, 0],
                ],
                columns=metadata.columns
               )
md3

Unnamed: 0,ID,DIA_DAY,DIA_WEEK,Woman,Score,MEDS
0,w1.4,1,4,AYAC01,3,0
1,w3.1,1,1,AYAC99,1,0
2,w3.4,1,4,AYAC99,5,0


In [66]:
pd.concat([metadata, md3], axis=0)

Unnamed: 0,ID,DIA_DAY,DIA_WEEK,Woman,Score,MEDS
0,w1.1,1,1,AYAC01,1,0
1,w1.3,3,1,AYAC01,2,0
2,w2.1,1,1,AYAC02,2,1
3,w2.64,1,10,AYAC02,0,1
4,w2.65,2,10,AYAC02,0,1
0,w1.4,1,4,AYAC01,3,0
1,w3.1,1,1,AYAC99,1,0
2,w3.4,1,4,AYAC99,5,0
