# Recipe X -- renaming Samples

### Where are Sample names stored?
In ipyrad sample names are stored as an attribute of __Sample__ objects, and Sample objects are stored in a dictionary that can be accessed from __Assembly__ objects. Let's first create an Assembly object from the test data included with ipyrad to use as an example. 

We load existing fastq files because this is probably the most common case where you would want to change the formatting of names to be different from the format of input data files. Otherwise, by default, Sample names will be extracted from the names of the input fastq files. 

In [1]:
## load ipyrad
import ipyrad as ip

## create a new Assembly Object named for our project
data = ip.Assembly("data")

## set path to some demultiplexed fastq data
data.set_params("sorted_fastq_path", "test_rad/data1_fastqs/*.gz")

## link the demuliplexed fastq files to new Samples
data.step1()

## print the dictionary of Samples
data.samples

DEBUG:ipyrad:H4CKERZ-mode: __loglevel__ = DEBUG
INFO:ipyrad.core.assembly:try 10: starting controller
DEBUG:ipyrad.core.assembly:OK! Connected to (4) engines


  New Assembly: data
12 new Samples created in `data`.
12 fastq files linked to 12 new Samples.


{'1A_0': <ipyrad.core.sample.Sample at 0x7f3d759dcf10>,
 '1B_0': <ipyrad.core.sample.Sample at 0x7f3d759dc850>,
 '1C_0': <ipyrad.core.sample.Sample at 0x7f3d759cf090>,
 '1D_0': <ipyrad.core.sample.Sample at 0x7f3d759cf8d0>,
 '2E_0': <ipyrad.core.sample.Sample at 0x7f3d75a3b190>,
 '2F_0': <ipyrad.core.sample.Sample at 0x7f3d759b0d10>,
 '2G_0': <ipyrad.core.sample.Sample at 0x7f3d759c31d0>,
 '2H_0': <ipyrad.core.sample.Sample at 0x7f3d759c3b90>,
 '3I_0': <ipyrad.core.sample.Sample at 0x7f3d75a49350>,
 '3J_0': <ipyrad.core.sample.Sample at 0x7f3d75a21f10>,
 '3K_0': <ipyrad.core.sample.Sample at 0x7f3d75a41a90>,
 '3L_0': <ipyrad.core.sample.Sample at 0x7f3d75a414d0>}

### A primer on Python dictionaries
We created an Assembly Object with 12 Samples. We can view the Samples linked to this Assembly as a dictionary. You can see that each Sample has a name that is a __key__ in the dictionary, and the stored __values__ are objects called `<ipyrad.core.sample.Sample>`, which is just a reference to the fact they are ipyrad Sample objects. 

In [2]:
## print the key and value pairs in the dictionary
for key, val in data.samples.items():
    print key, val

2G_0 <ipyrad.core.sample.Sample object at 0x7f3d759c31d0>
3K_0 <ipyrad.core.sample.Sample object at 0x7f3d75a41a90>
3J_0 <ipyrad.core.sample.Sample object at 0x7f3d75a21f10>
2E_0 <ipyrad.core.sample.Sample object at 0x7f3d75a3b190>
1A_0 <ipyrad.core.sample.Sample object at 0x7f3d759dcf10>
1B_0 <ipyrad.core.sample.Sample object at 0x7f3d759dc850>
3I_0 <ipyrad.core.sample.Sample object at 0x7f3d75a49350>
3L_0 <ipyrad.core.sample.Sample object at 0x7f3d75a414d0>
2F_0 <ipyrad.core.sample.Sample object at 0x7f3d759b0d10>
1C_0 <ipyrad.core.sample.Sample object at 0x7f3d759cf090>
1D_0 <ipyrad.core.sample.Sample object at 0x7f3d759cf8d0>
2H_0 <ipyrad.core.sample.Sample object at 0x7f3d759c3b90>


In [3]:
## Another way to access the same key:val pairs
for key in data.samples:
    print key, data.samples[key]

2G_0 <ipyrad.core.sample.Sample object at 0x7f3d759c31d0>
3K_0 <ipyrad.core.sample.Sample object at 0x7f3d75a41a90>
3J_0 <ipyrad.core.sample.Sample object at 0x7f3d75a21f10>
2E_0 <ipyrad.core.sample.Sample object at 0x7f3d75a3b190>
1A_0 <ipyrad.core.sample.Sample object at 0x7f3d759dcf10>
1B_0 <ipyrad.core.sample.Sample object at 0x7f3d759dc850>
3I_0 <ipyrad.core.sample.Sample object at 0x7f3d75a49350>
3L_0 <ipyrad.core.sample.Sample object at 0x7f3d75a414d0>
2F_0 <ipyrad.core.sample.Sample object at 0x7f3d759b0d10>
1C_0 <ipyrad.core.sample.Sample object at 0x7f3d759cf090>
1D_0 <ipyrad.core.sample.Sample object at 0x7f3d759cf8d0>
2H_0 <ipyrad.core.sample.Sample object at 0x7f3d759c3b90>


### Sample object attributes

Sample objects have attributes of their own, including one called __name__, which we can access. The name attribute of Samples is used to store file names so this is the important value that we want to change. However, we also need to change the dictionary __key__, since this is used to reference Samples. 

In [4]:
## print key and name attribute of a Sample object
for key, val in data.samples.items():
    print key, val.name, val

2G_0 2G_0 <ipyrad.core.sample.Sample object at 0x7f3d759c31d0>
3K_0 3K_0 <ipyrad.core.sample.Sample object at 0x7f3d75a41a90>
3J_0 3J_0 <ipyrad.core.sample.Sample object at 0x7f3d75a21f10>
2E_0 2E_0 <ipyrad.core.sample.Sample object at 0x7f3d75a3b190>
1A_0 1A_0 <ipyrad.core.sample.Sample object at 0x7f3d759dcf10>
1B_0 1B_0 <ipyrad.core.sample.Sample object at 0x7f3d759dc850>
3I_0 3I_0 <ipyrad.core.sample.Sample object at 0x7f3d75a49350>
3L_0 3L_0 <ipyrad.core.sample.Sample object at 0x7f3d75a414d0>
2F_0 2F_0 <ipyrad.core.sample.Sample object at 0x7f3d759b0d10>
1C_0 1C_0 <ipyrad.core.sample.Sample object at 0x7f3d759cf090>
1D_0 1D_0 <ipyrad.core.sample.Sample object at 0x7f3d759cf8d0>
2H_0 2H_0 <ipyrad.core.sample.Sample object at 0x7f3d759c3b90>


### How can Sample names be changed?
By modifying the samples dictionary of an Assembly object. However, modifying keys of a dictionary is a little tricky, and so I would recommend using one of the methods shown below. 

### Replacing a single name

In [5]:
## use pop to select one Sample by it's key, which
## also removes that key and value pair from the dict
sample = data.samples.pop("1A_0")

## now update its sample.name attribute to a new value
sample.name = "1A_X"

## now put the Sample back into the dict w/ a matching key
data.samples[sample.name] = sample

In [6]:
## look at the change
for key, val in data.samples.items():
    print key, val.name, val

2G_0 2G_0 <ipyrad.core.sample.Sample object at 0x7f3d759c31d0>
3K_0 3K_0 <ipyrad.core.sample.Sample object at 0x7f3d75a41a90>
3J_0 3J_0 <ipyrad.core.sample.Sample object at 0x7f3d75a21f10>
2E_0 2E_0 <ipyrad.core.sample.Sample object at 0x7f3d75a3b190>
1A_X 1A_X <ipyrad.core.sample.Sample object at 0x7f3d759dcf10>
1B_0 1B_0 <ipyrad.core.sample.Sample object at 0x7f3d759dc850>
3I_0 3I_0 <ipyrad.core.sample.Sample object at 0x7f3d75a49350>
3L_0 3L_0 <ipyrad.core.sample.Sample object at 0x7f3d75a414d0>
2F_0 2F_0 <ipyrad.core.sample.Sample object at 0x7f3d759b0d10>
1C_0 1C_0 <ipyrad.core.sample.Sample object at 0x7f3d759cf090>
1D_0 1D_0 <ipyrad.core.sample.Sample object at 0x7f3d759cf8d0>
2H_0 2H_0 <ipyrad.core.sample.Sample object at 0x7f3d759c3b90>


### Modifying all names
Problems arise when you try to modify a dictionary while you are iterating over it. The simplest way around this is to get the list of keys you wish to iterate over before starting the loop. This way the keys will not change during the iteration. Below we first store key names in a list called names. Then we iterate over names and replace keys and Sample.names the same as above.  

In [None]:
## get list of keys in dict
names = data.samples.keys()

## iterate over keys in names
for samplekey in names:
    ## use pop to select the Sample by it's key
    sample = data.samples.pop(samplekey)

    ## now update the sample.name attribute to a new val
    ## here we add the name "New_" to each sample
    sample.name = "New_"+sample.name

    ## now put the Sample back into the dict w/ a matching key
    data.samples[sample.name] = sample

#### Print the results for a sanity check
compare the original fastq file name and the new name.

In [10]:
## print the results
for key, val in data.samples.items():
    ## print the original fastq file for this sample
    print val.files.fastqs[0][0]
    
    ## print its new key and sample.name
    print key, val.name, '\n'

test_rad/data1_fastqs/2G_0_R1_.fastq.gz
New_2G_0 New_2G_0 

test_rad/data1_fastqs/3K_0_R1_.fastq.gz
New_3K_0 New_3K_0 

test_rad/data1_fastqs/3J_0_R1_.fastq.gz
New_3J_0 New_3J_0 

test_rad/data1_fastqs/1B_0_R1_.fastq.gz
New_1B_0 New_1B_0 

test_rad/data1_fastqs/2F_0_R1_.fastq.gz
New_2F_0 New_2F_0 

test_rad/data1_fastqs/3L_0_R1_.fastq.gz
New_3L_0 New_3L_0 

test_rad/data1_fastqs/3I_0_R1_.fastq.gz
New_3I_0 New_3I_0 

test_rad/data1_fastqs/2H_0_R1_.fastq.gz
New_2H_0 New_2H_0 

test_rad/data1_fastqs/1A_0_R1_.fastq.gz
New_1A_X New_1A_X 

test_rad/data1_fastqs/1C_0_R1_.fastq.gz
New_1C_0 New_1C_0 

test_rad/data1_fastqs/1D_0_R1_.fastq.gz
New_1D_0 New_1D_0 

test_rad/data1_fastqs/2E_0_R1_.fastq.gz
New_2E_0 New_2E_0 

