# Import Data
This script will import data into a dataframe called *data*. 

The folder containing the measurement results contains a single *csv* file. This csv file contains two sections; the hirst rows contains general information regarding the measurement (voltage, magnification, etc.) whereas the second part contains a list of discover particles with the collected data. 

In order to import the data, at first, the first (and only) csv file in the main directory is searched and opened ([Search CSV](#Search-CSV)). Then the headers are read ([Headers](#Headers)) one line at a time and if the line contains important information, this information is loaded in. The single line scanning continues until the column headers are identified. These headers are temporarily stored after which the remaining particle data is loaded ([Particle data](#Particle-data)) into an array. To better handle the data, the stored column names are assigned to the array into a dataframe. Finally, only particles from the given stub are selected ([Stub selection](#Stub-selection)). In order to prepare the data for some future usage, some columns are copied into new columns ([Special data columns](#Special-data-columns)).

#### Preparation
As this script is designed for external use but should also be usable stand-alone, a number of variables assigned with standard values in case they do not yet exist.

In [1]:
# Import modules
from os import listdir
import numpy as np
import pandas as pd

In [10]:
# directory
try:
    directory
except NameError:
    directory = "F:\\PA_UC\\"
    print("Directory not specified, set to "+directory)

# stub
try:
    stub
except NameError:
    stub = 1
    print("Stub not specified, set to "+str(stub))

#### Search CSV
At first, the last (and only) csv file in the directory is selected and returned into the variable *file*.

In [5]:
for i in listdir(directory):
    if i.endswith(".csv"):
        file = directory+i

print("File: "+file)

File: F:\PA_UC\pa_uc.csv


#### Headers
Then, the file is opened and the headers are scanned. For a number of headers, the data is printed (like magnification and voltage), others are skipped. Finally, the line containing the column names is found (starting with *Part#*). This line is read, split based on commas and spaces are removed. This list will be used later. Also, the number of lines to skip before reaching the particle data is now known.

In [6]:
with open(file) as f:
    for i in range(0, 20):
        line = f.readline().replace("\n", "").split(',')
        if(line[0].strip()=="Part#"):
            columns = line
            skip=i+1
        elif(line[0].strip()=="Date"):
            print("Date: "+str(line[2].strip())+"."+str(line[1].strip())+"."+str(line[3].strip()))
        elif(line[0].strip()=="Acc."):
            print("Voltage: "+str(line[2].strip())+" kV")
        elif(line[0].strip()=="Magn:"):
            print("Magnification: "+str(line[1].strip())+"x")
        elif(line[0].strip()=="Preset"):
            print("Measurement time: "+str(line[2].strip())+" s")

print('Number of columns: '+str(len(columns)))
for i in range(len(columns)):
    columns[i] = columns[i].strip()

Date: 13.12.2016
Voltage: 10.0 kV
Magnification: 2500x
Measurement time: 30.0 s
Number of columns: 28


#### Particle data
The particle data is present as a long list, with a single particle on each line and the data separated by commas. The whole list is loaded into a numpy array. The array is then converted into a pandas dataframe using the previously obtained column names as header. This allows columns to be selected more easily, especially since the columns differ sometimes between measurements (mostly due to selected EDX elements).

In [7]:
data = np.loadtxt(file, delimiter=',', skiprows=skip, comments='_')
print("Number of particles: "+str(len(data)))

# Transfer data into a dataframe
data = pd.DataFrame(data, columns=columns)
    
#print(data)

Number of particles: 3639


#### Stub selection
Since a particle analysis can be performed over multiple samples (*stubs*), only a single stub is imported. Therefore, the dataframe is expended with the stub and field number after which only the particles from the selected stub are selected.

In [8]:
# Get stub for each particle
data["stub"] = np.rint(np.floor(data["Field#"]/10000))

# Only select particles from selected strub
data = data[data["stub"]==stub]

# Get field number
data["fieldnum"] = np.rint(np.floor(data["Field#"]-10000*stub))

print("Particles on stub "+str(stub)+": "+str(len(data)))

#print("Number of fields: "+str(len()))

Particles on stub 1: 829


#### Special data columns
In order to prepare the data for some future functions (e.g. ImageJ particle analysis and particle relocalization), some data is copied into specific columns. At present, the following columns are copied:
- *X* <- *StgX* : To be used for relozalization (e.g. apply translation and rotation)
- *Y* <- *StgY* : To be used for relozalization (e.g. apply translation and rotation)
- *d* <- *AvgDiam* : To be used for alternative calculations (e.g. ImageJ)
- *A* <- *Aspe* : To be used for alternative calculations (e.g. ImageJ)

In [9]:
data["X"] = data["StgX"]
data["Y"] = data["StgY"]
data["d"] = data["AvgDiam"]
data["A"] = data["Aspe"]

In [25]:
# Just to test some output options:
#from IPython.display import HTML
#HTML(data.to_html())