# Importing data from (highly-specific) .dat files

### Imports

In [111]:
import pandas as pd

### Function that loads in a .dat file and returns a dataframe

This function is a bit hideous, sorry, but it does (as far as I understand the task) work.

Essentially, it works as follows:

- the function takes a filename (plus the `.dat` extension)
- It loads the file as raw text
- For each line in the file, it breaks it into two parts based on the first space and adds the two pieces to `values`
- `values` is then transformed into two lists (columns and values)
- the lists are converted into a dataframe with the headings as the index and the values as the sole column
- the function returns the dataframe

In [112]:
def create_dataframe_from_dat(filename):
    
    # Open the file
    with open(filename) as file:
        
        # Holder for data from each line
        values = []
        
        # Go through the file line-by-line
        for line in file.readlines():
            
            # For each line,
            
            # Remove excess spaces at either end of the line, then split it by spaces, then
            # remove excess spaces at either end of each element
            split_line = [x.strip() for x in line.strip().split(" ")]
            
            # Add a new element to values, containing the first bit of split_line
            # (the column heading) and all the other bits joined back together
            values.append((split_line[0], " ".join(split_line[1:])))
            
        # Convert the list of column/value pairs to two lists, one of headings and one of values
        data = list(zip(*values))
        
        # Make a dataframe with the columns as the index, and then transpose it so the columns
        # are the columns
        df = pd.DataFrame(index=data[0], data=data[1])
        
        # Return the dataframe
        return df.T

To actually use the function, just pass in any filename (assuming all your `.dat` files have the same internal structure.

In [113]:
df = create_dataframe_from_dat("HIST_Micelle_Residue_Size_[shape]_BT00000100_FT00000099.dat")

df.head()

Unnamed: 0,#TITLE,#TIME,#BLOCKLENGTH,#SAMPLEPOINTS,#TYPE,#COLUMN,#COLUMN.1,#HIST,#LOWER_LIMIT,#UPPER_LIMIT,...,230,231,232,233,234,235,236,237,238,239
0,Probability of residues in micelle,99,100,2691,1 Probability Density,1 Number of residues,2 Time average probability density function,DETAILS 1 239 1.000000 0.500000 239.500000,4.0,228.0,...,0,0,0,0,0,0,0,0,0,0


As is probably clear, there's a lot of cleaning to be done, but hopefully having it in a more familiar format helps with that. 

### Joining separate files into one dataframe

I don't know if you actually want to do this, but the below code shows how to combine different files into one dataframe with lots of rows.

In [114]:
# Create two one-row dataframes

a = create_dataframe_from_dat("HIST_Micelle_Residue_Size_[shape]_BT00000100_FT00000099.dat")
b = create_dataframe_from_dat("HIST_Micelle_Residue_Size_[shape]_BT00000100_FT00000099.dat")

In [115]:
# Combine the two (or more) dataframes together by passing them as a list to pd.concat

new_df = pd.concat([a,b])

# Inspect the new df

new_df.head()

Unnamed: 0,#TITLE,#TIME,#BLOCKLENGTH,#SAMPLEPOINTS,#TYPE,#COLUMN,#COLUMN.1,#HIST,#LOWER_LIMIT,#UPPER_LIMIT,...,230,231,232,233,234,235,236,237,238,239
0,Probability of residues in micelle,99,100,2691,1 Probability Density,1 Number of residues,2 Time average probability density function,DETAILS 1 239 1.000000 0.500000 239.500000,4.0,228.0,...,0,0,0,0,0,0,0,0,0,0
0,Probability of residues in micelle,99,100,2691,1 Probability Density,1 Number of residues,2 Time average probability density function,DETAILS 1 239 1.000000 0.500000 239.500000,4.0,228.0,...,0,0,0,0,0,0,0,0,0,0
