`pandas` is a Python package created to aid with data analysis.

## Importing `pandas`

To import the package, place this import statement at the top of the file:

In [1]:
import pandas

(in a Jupyter Notebook, it's okay to have cells above the import, but this should come before any other Python code).  Nearly all `pandas` users import the library under the name `pd` to save typing when calling its functions:

In [None]:
#import pandas as pd

This allows you, for example, to call `pandas`'s `read_csv()` function as `pd.read_csv()` instead of the longer `pandas.read_csv()`.  We'll do this in the future, but for this introduction we'll leave it as `pandas` to help clarify which functions are coming from `pandas` and which are standard Python.

## Reading data

The `pandas` package defines a `DataFrame` object and uses it to manage data.  A `DataFrame` organizes data into a 2-dimensional array similar to a spreadsheet or database table.  `DataFrame`s can be created manually within the code, or they can be read in from an external file using one of several functions `pandas` includes for this purpose.  One such function is `pandas.read_csv()`, designed to import data from CSV files.

### `pandas.read_csv()`

`pandas.read_csv()` takes a CSV data file and automatically converts it into a `DataFrame` object.  Below, we assign that object to the variable `myDataFrame`:

In [None]:
myDataFrame = pandas.read_csv('path/to/file.csv')   # Not a real file, don't try to run this

The location of the CSV file to read is the only *required* argument; it should be given as a string (that is, enclosed by single or double quotes).  The file path can be either absolute or relative to the location where you're working.  For Jupyter Notebooks, that's the directory where the `jupyter notebook` program was started (unless configured otherwise), and for these notebooks that's usually `/Users/QuarkNet/Jupyter/`.

If necessary, you can check your current working directory using the UNIX `pwd` command:

In [2]:
!pwd

/home/jgriffith/git_project_roots/Cosmic/Resources


In addition to the required "name of the file to read," `pandas.read_csv()` also accepts several *optional* arguments that can make life easier for you:

* `names=['column1', 'column2', 'column3', ...]`<br>
    allows you to specify the names of the columns in the CSV file as strings.  Of course, the number of items in `names` must match the number of columns `pandas.csv()` reads from the file.
    
    
* `header=None`<br>
    prevents `pandas.csv()` from importing header information as data.  Normally, the file header of a CSV file is a list of its column names, but it could be more complicated than that.  Our cosmic ray data files, for example, have an extensive preamble that `header=None` eliminates.
    
    
* `skiprows=N`<br>
    tells `pandas.csv()` to skip the first `N` lines of the input file, with `N` given as an integer number (not a string).  This is another potentially useful approach to removing the cosmic ray data preamble lines


The `pandas` [documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) has more examples.  We'll add them to the above list as we discover and use them.

#### Using the data

Once you've read in the data you want to analyze and assigned the resulting `DataFrame` to a variable, `pandas` provides a variety of functions to show and manipulated the data.

## Parsing cosmic ray data

Using `with` in this manner ensures that the file is closed when it's no longer needed, even if there's an error, which prevents runaway memory usage and file corruption problems.

The above code reads in `dataIn` line-by-line, using the `enumerate()` function to include line numbers along with the lines.  When we hit the line `ST Enabled, scalar data plus reset counters`, we know we're about to start the real cosmic ray data lines, so we record this line number as `indicatorLine`.  We then `break` because we don't need to read in any more at the moment.