# Rates, Pressure, Temperature DataFrame Creator

This is a simple notebook in the end provides a dataframe for the data that you want to analyse.<br>
1. It uses the two environmental files to create a dataframe with 3 columns<br>
gpstime | Presssure | Temperature<br>
<br>
2. It uses the muon files to extract a second dataframe with two columns<br>
unixtime | muonrates<br>
<br>
3. In the final step it joins the two dataframes in one based on the timestamp<br>
(It transforms the gpstime to unixtime first)
<br>
You may want to extract the dataframe in a csv file to use for your analysis

In [1]:
from datetime import datetime
import numpy as np
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
pd.set_option('display.float_format', '{:.10g}'.format)
import os
from matplotlib import pyplot as plt

In [3]:
def TimestampConvGPStoUNIX(gps_timestamp,leap_secs=18):
    #Converts the gpstimestamp to unixtimestamp
    unix_timestamp = gps_timestamp + 315964800 - leap_secs
    
    return unix_timestamp

## 1. It uses the two environmental files to create a dataframe with 3 columns

In [6]:
#Create the file catalog for the environmental files that will be read
#Here i use the names and path for my files, you should change these to yours
directory="/Users/theoavg/Desktop/VirgoO3_Analysis/dataO3/"
file_catalog=[directory+"ENV_METEO_PRES_1265673618_1641600.txt", directory+"ENV_METEO_TE_1265673618_1641600.txt"]

feature_catalog=["gpstime","Pressure","Temperature"]

In [7]:
# read the 1st file from the file_catalog and creates the dataframe named data with two columns
# feature_catalog[0] | feature_catalog[1] that is for this specific example
# "gpstime" | "Pressure"
i=0
data = pd.read_csv(file_catalog[i], 
                   sep="\s", header = None, names=[feature_catalog[0],feature_catalog[i+1]], engine='python')

# Use a for loop to read the rest of the files in the file_catalog list
# In this exapmple we have one extra file (the workframe was developed for many files)

for i in range(1,len(file_catalog)):
    data_to_append = pd.read_csv(file_catalog[i], 
                   sep="\s", header = None, names=[feature_catalog[0],feature_catalog[i+1]], engine='python')
    
    # Append the last column of the data_to_append dataframe to the initial data dataframe we created above
    data=data.join(data_to_append.iloc[:,1])
    #data.head()

In [11]:
# The environmental dataframe created from the two files
data.head()
data.tail()

Unnamed: 0,gpstime,Pressure,Temperature
0,1265673618,1019.5,12.72000027
1,1265673619,1019.5,12.72000027
2,1265673620,1019.5,12.72000027
3,1265673621,1019.5,12.72000027
4,1265673622,1019.5,12.72000027


Unnamed: 0,gpstime,Pressure,Temperature
1641595,1267315213,1010.400024,6.21999979
1641596,1267315214,1010.400024,6.21999979
1641597,1267315215,1010.400024,6.21999979
1641598,1267315216,1010.400024,6.21999979
1641599,1267315217,1010.400024,6.21999979


In [26]:
# Uncomment the line below to extract the dataframe to a csv file in case you dont want to redo the steps above
data.to_csv("EnvironmentalData/GPSPresTemp_dataset.csv",index=False)

## 2. It uses the muon files to extract a second dataframe with two columns

In [13]:
def datafiletodataframe(datafile):
    # Converts the detector datafile to a dataframe with the first column being the unix timestamp and the second
    # column the sum of the events registered at that second. Since this is the number of events per second we name
    # this collumn frequency which physically means the muon rate.
    names2read=["unixtime","eventnum","finetime","tof1","tof2","plane_num","crap1","crap2"]+[str(i) for i in range(130)] # the datafile has many columns and here we initialize the max number that the reader is going to find. Ask for more details if mandatory. 
    columns=["unixtime","eventnum","finetime","tof1","tof2","plane_num"] # the important columns to put in the dataframe
    dataframe= pd.read_csv(datafile, delim_whitespace=True, header = None, names=names2read, engine='python') # pd.read_csv is not dedicated to csv files. We use it to read .dat files
    dataframe=dataframe[columns]
    
    # the following lines count the number of muon events per unix timestamp and returns the dataframe for the specific datafile input
    dataframe=dataframe.loc[dataframe["plane_num"]==32]
    dataframe_freq=pd.DataFrame(dataframe["unixtime"].value_counts(sort=False))
    dataframe_freq = dataframe_freq.sort_index()
    dataframe_freq = dataframe_freq.reset_index()
    dataframe_freq.columns = ['unixtime', 'frequency']
    
    return dataframe_freq

In [25]:
#Uncomment the following line in case you want to remove the file before recreating it from scratch
#os.remove("muondetpersec_test.csv")

with open("muondetpersec_test.csv","a") as fout: #opens the file to append ("a") things to it
    df_header=pd.DataFrame(columns=['unixtime', 'frequency']) # empty dataframe with just a header to put to fout
    df_header.to_csv(fout,index=False) #write the empty dataframe to fout. It only adds the header values
    
    # a for loop to go through all the muon detector file in order to extract the muon rates ("fequency") per sec
    # Each frequency value is paired with the corresponding unix timestamp
    
    numoffiles=200 #the total number of files with the same first part (here: EGO-central-building-zen00-20200214-) you are going to use
    
    # In case the Datafiles you use have names that change (other than the numbers) then you should execute in a segmented
    # manner or change the code to take this into account
    
    for i in range(numoffiles):
        if i%100==0: print(i) # print the num file in order to see that the extraction progresses
        datafile="/Users/theoavg/Desktop/VirgoO3_Analysis/marteau_data_files/DetDatFiles/EGO-central-building-zen00-20200214-"+str(i)+".dat"
        df_out=datafiletodataframe(datafile)
        df_out.to_csv(fout,header=False,index=False)
        #df_final=df_final.append(df_out,ignore_index=True)

0
100


##  3. Final step: Joins the two dataframes in one based on the timestamp

In [None]:
def TimestampConvGPStoUNIX(gps_timestamp,leap_secs=18):
    # converts the gps timestamp to unix timestamp
    unix_timestamp = gps_timestamp + 315964800 - leap_secs
    
    return unix_timestamp

In [27]:
env_filepath="EnvironmentalData/GPSPresTemp_dataset.csv" #the environmental parameters file to read
env_df=pd.read_csv(env_filepath) #read the file into a dataframe
env_df["unixtime"]=env_df["gpstime"].apply(TimestampConvGPStoUNIX) #add the unixtime column to the dataframe

In [29]:
env_df.head()

Unnamed: 0,gpstime,Pressure,Temperature,unixtime
0,1265673618,1019.5,12.72000027,1581638400
1,1265673619,1019.5,12.72000027,1581638401
2,1265673620,1019.5,12.72000027,1581638402
3,1265673621,1019.5,12.72000027,1581638403
4,1265673622,1019.5,12.72000027,1581638404


In [30]:
env_uxidx=env_df.set_index("unixtime") #set the dataframe index to unixtime
env_uxidx=env_uxidx.drop(columns="gpstime") # Drop the gpstime column since it is irrelevant for the following steps

In [31]:
env_uxidx.head()

Unnamed: 0_level_0,Pressure,Temperature
unixtime,Unnamed: 1_level_1,Unnamed: 2_level_1
1581638400,1019.5,12.72000027
1581638401,1019.5,12.72000027
1581638402,1019.5,12.72000027
1581638403,1019.5,12.72000027
1581638404,1019.5,12.72000027


In [33]:
muon_filepath="muondetpersec_test.csv"
muon_df=pd.read_csv(muon_filepath)
muon_uxidx=muon_df.set_index("unixtime")

In [34]:
muon_uxidx.head()

Unnamed: 0_level_0,frequency
unixtime,Unnamed: 1_level_1
1581671837,21
1581671838,18
1581671839,15
1581671840,18
1581671841,16


In [35]:
#join the two datasets based on the unixtimes of the environmental parameters dataset
data_total=env_uxidx.join(muon_uxidx,how="left")

In [36]:
data_total.head() #The nan values are due to the lack of data at that timestamp and the fact that i used only 200 muon data files instead of the 2600 available

Unnamed: 0_level_0,Pressure,Temperature,frequency
unixtime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1581638400,1019.5,12.72000027,
1581638401,1019.5,12.72000027,
1581638402,1019.5,12.72000027,
1581638403,1019.5,12.72000027,
1581638404,1019.5,12.72000027,


In [37]:
data_total.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1641600 entries, 1581638400 to 1583279999
Data columns (total 3 columns):
Pressure       1641600 non-null float64
Temperature    1641600 non-null float64
frequency      118845 non-null float64
dtypes: float64(3)
memory usage: 130.1 MB


In [38]:
#extract the dataframe into a csv file
data_total.to_csv("TotalData.csv",index=False)