# Recipe X -- renaming Samples

### Where are Sample names stored?
In ipyrad, Sample names are stored as an attribute of Sample objects, and Sample objects are stored in a dictionary that can be accessed from Assembly objects. Let's first load in the data set below to show an example. 

In [36]:
## load ipyrad
import ipyrad as ip

## create a new Assembly Object named for our project
data = ip.Assembly("renaming")

## set path to some demultiplexed fastq data
data.set_params('sorted_fastq_path', "test_rad/sim_rad_*.fastq.gz")

## link the demuliplexed fastq files to new Samples
data.step1()

  loading Assembly: data1 [data/saved_states/state1_rad.assembly]


#### Accessing Python dictionaries
We loaded in an Assembly named `data` that has 12 Samples. We can view the Samples linked to this Assembly as a dictionary. You can see that each Sample has a name that is a __key__ in the dictionary, and the stored __values__ are objects called `<ipyrad.core.sample.Sample>`, which is just a reference to the fact they are ipyrad Sample objects. 

In [37]:
## print the key and value pairs in the dictionary
for key, val in data.samples.items():
    print key, val

2G_0 <ipyrad.core.sample.Sample object at 0x7f11b40beed0>
3K_0 <ipyrad.core.sample.Sample object at 0x7f11b40311d0>
3J_0 <ipyrad.core.sample.Sample object at 0x7f11b4006710>
2E_0 <ipyrad.core.sample.Sample object at 0x7f11b412fb50>
1A_0 <ipyrad.core.sample.Sample object at 0x7f11b4072390>
1B_0 <ipyrad.core.sample.Sample object at 0x7f11b4044490>
3I_0 <ipyrad.core.sample.Sample object at 0x7f11b40a6110>
3L_0 <ipyrad.core.sample.Sample object at 0x7f11b4015310>
2F_0 <ipyrad.core.sample.Sample object at 0x7f11b4010690>
1C_0 <ipyrad.core.sample.Sample object at 0x7f11b406f110>
1D_0 <ipyrad.core.sample.Sample object at 0x7f11b40a6950>
2H_0 <ipyrad.core.sample.Sample object at 0x7f11b4038610>


In [38]:
## Another way to access the same key:val pairs
for key in data.samples:
    print key, data.samples[key]

2G_0 <ipyrad.core.sample.Sample object at 0x7f11b40beed0>
3K_0 <ipyrad.core.sample.Sample object at 0x7f11b40311d0>
3J_0 <ipyrad.core.sample.Sample object at 0x7f11b4006710>
2E_0 <ipyrad.core.sample.Sample object at 0x7f11b412fb50>
1A_0 <ipyrad.core.sample.Sample object at 0x7f11b4072390>
1B_0 <ipyrad.core.sample.Sample object at 0x7f11b4044490>
3I_0 <ipyrad.core.sample.Sample object at 0x7f11b40a6110>
3L_0 <ipyrad.core.sample.Sample object at 0x7f11b4015310>
2F_0 <ipyrad.core.sample.Sample object at 0x7f11b4010690>
1C_0 <ipyrad.core.sample.Sample object at 0x7f11b406f110>
1D_0 <ipyrad.core.sample.Sample object at 0x7f11b40a6950>
2H_0 <ipyrad.core.sample.Sample object at 0x7f11b4038610>


### Sample object attributes

Sample objects have attributes of their own, including one called name, which we can access. The name attribute of Samples is what is used to keep track of file names, including the output files, and so this is the important value that we want to change. However, we will also want to change the dictionary __key__, since this is our easy way of referencing each Sample. 

In [39]:
## print name and an exapmle stats attribute of a Sample object
for key, val in data.samples.items():
    print key, val.name, val.stats.reads_raw

2G_0 2G_0 20026.0
3K_0 3K_0 20117.0
3J_0 3J_0 20011.0
2E_0 2E_0 19928.0
1A_0 1A_0 20099.0
1B_0 1B_0 19977.0
3I_0 3I_0 20084.0
3L_0 3L_0 19901.0
2F_0 2F_0 19934.0
1C_0 1C_0 20114.0
1D_0 1D_0 19895.0
2H_0 2H_0 19936.0


### How can Sample names be changed?
By modifying the dictionary. Actually, modifying keys of a dictionary is a little tricky, and so the easiest way is to create a new dictionary composed of __keys__ that make up the new Sample names we want, and __vals__ that include the Sample objects with the name attribute matching to the keys. 

### Replacing a single name

In [40]:
## use pop to select the Sample by it's key
sample = data.samples.pop("1A_0")

## now update the sample.name attribute to a new val
sample.name = "1A_X"

## now put the Sample back into the dict w/ a matching key
data.samples[sample.name] = sample

In [41]:
for key, val in data.samples.items():
    print key, val.name, val

2G_0 2G_0 <ipyrad.core.sample.Sample object at 0x7f11b40beed0>
3K_0 3K_0 <ipyrad.core.sample.Sample object at 0x7f11b40311d0>
3J_0 3J_0 <ipyrad.core.sample.Sample object at 0x7f11b4006710>
2E_0 2E_0 <ipyrad.core.sample.Sample object at 0x7f11b412fb50>
1A_X 1A_X <ipyrad.core.sample.Sample object at 0x7f11b4072390>
1B_0 1B_0 <ipyrad.core.sample.Sample object at 0x7f11b4044490>
3I_0 3I_0 <ipyrad.core.sample.Sample object at 0x7f11b40a6110>
3L_0 3L_0 <ipyrad.core.sample.Sample object at 0x7f11b4015310>
2F_0 2F_0 <ipyrad.core.sample.Sample object at 0x7f11b4010690>
1C_0 1C_0 <ipyrad.core.sample.Sample object at 0x7f11b406f110>
1D_0 1D_0 <ipyrad.core.sample.Sample object at 0x7f11b40a6950>
2H_0 2H_0 <ipyrad.core.sample.Sample object at 0x7f11b4038610>


### Modifying all names
Problem arise when you try to modify a dictionary while you are iterating over it. The simplest way around this is to get the list of keys you wish to iterate over before starting the loop. This way the keys will not change during the iteration. Below we first store key names in a list called names. Then we iterate over names and replace keys and Sample.names the same as above.  

In [50]:
## get list of keys in dict
names = data.samples.keys()

## iterate over keys in names
for samplekey in names:
    ## use pop to select the Sample by it's key
    sample = data.samples.pop(samplekey)

    ## now update the sample.name attribute to a new val
    ## here we add the name "New_" to each sample
    sample.name = "New_"+sample.name

    ## now put the Sample back into the dict w/ a matching key
    data.samples[sample.name] = sample

## print the results
for key, val in data.samples.items():
    ## print the original fastq file for this sample
    print val.files.fastqs[0]
    ## print its new key and sample.name
    print key, val.name

('/home/deren/Documents/ipyrad/tests/test_rad/data1_fastqs/1C_0_R1_.fastq.gz',)
New_New_New_New_New_1C_0 New_New_New_New_New_1C_0
('/home/deren/Documents/ipyrad/tests/test_rad/data1_fastqs/3J_0_R1_.fastq.gz',)
New_New_New_New_New_3J_0 New_New_New_New_New_3J_0
('/home/deren/Documents/ipyrad/tests/test_rad/data1_fastqs/2H_0_R1_.fastq.gz',)
New_New_New_New_New_2H_0 New_New_New_New_New_2H_0
('/home/deren/Documents/ipyrad/tests/test_rad/data1_fastqs/1D_0_R1_.fastq.gz',)
New_New_New_New_New_1D_0 New_New_New_New_New_1D_0
('/home/deren/Documents/ipyrad/tests/test_rad/data1_fastqs/2E_0_R1_.fastq.gz',)
New_New_New_New_New_2E_0 New_New_New_New_New_2E_0
('/home/deren/Documents/ipyrad/tests/test_rad/data1_fastqs/1B_0_R1_.fastq.gz',)
New_New_New_New_New_1B_0 New_New_New_New_New_1B_0
('/home/deren/Documents/ipyrad/tests/test_rad/data1_fastqs/3I_0_R1_.fastq.gz',)
New_New_New_New_New_3I_0 New_New_New_New_New_3I_0
('/home/deren/Documents/ipyrad/tests/test_rad/data1_fastqs/2F_0_R1_.fastq.gz',)
New_New_Ne

### Trying to change names after step2
Don't do this. It will break the Sample object's paths used to find file names created in previous steps. You can only change file names after step1, where the Sample objects have just been created but have not yet been used to create new Files. 

In [44]:
data.step2()

In [46]:
print data.stats

          state  reads_raw  reads_filtered
New_1A_X      2      20099           20099
New_1B_0      2      19977           19977
New_1C_0      2      20114           20114
New_1D_0      2      19895           19895
New_2E_0      2      19928           19928
New_2F_0      2      19934           19934
New_2G_0      2      20026           20026
New_2H_0      2      19936           19936
New_3I_0      2      20084           20084
New_3J_0      2      20011           20011
New_3K_0      2      20117           20117
New_3L_0      2      19901           19901
