# Describing data with statistics and a graphs  

This activity uses data from the [CMS detector](https://cms.cern/detector) at CERN in Geneva, Switzerland as an example. To get started,
- You won't hurt anything by experimenting. If you break it, close the tab and open the activity again to start over.
- Is this your first time? Need a refresher? Try the 5-minute [Intro to Coding activity](./intro.ipynb) and come back here. 

In [None]:
# imports some software packages we'll use
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

In [None]:
# read in a data file.
data = pd.read_csv('https://github.com/QuarkNet-HEP/coding-camp/raw/main/data/Double_Muon_Run2011A.csv')
data.head()

## Descriptive statistics  
For numerical data, useful descriptive statistics include number of values, mean, median, min/max, etc.

In [None]:
# describe can show useful quantities for a single column
data['E1'].describe()

In [None]:
# or get just the quantity you need
data['E1'].mean()

In [None]:
med = data['E1'].median()

For for a column of categorical data, describe( ) calculates different quantities.

In [None]:
# this is a column of catergorical values
data['Type2'].describe()

In [None]:
# shows all the different values in that column
data['Type2'].unique()

Describe( ) can also show descriptive statistics for the entire dataframe, *but* by default it omits categorical columns.

In [None]:
data.describe()

In [None]:
# to analyze all columns, even categorical ones
data.describe(include='all')

## Vizualizing a single column of values  

In [None]:
# you can specify the number of bins, range of values to plot, and more.
plt.hist(data['E1'], bins=10, range=(100,400), histtype='step')
plt.show()

## Credits
This notebook was designed by [Quarknet](https://quarknet.org/) Teaching and Learning Fellow [Adam LaMee](https://adamlamee.github.io/). The handy csv files were created from the CMS Run2011A primary datasets and converted from ROOT format by the masterful [Tom McCauley](https://github.com/tpmccauley). More can be found on the [CERN OpenData](http://opendata.cern.ch/?ln=en) site, like [here](http://opendata.cern.ch/record/545). The 3D vector image can be found on [WikiMedia Commons](https://commons.wikimedia.org/wiki/File:Coord_XYZ.svg). Finally, thanks to the great folks at [Binder](https://mybinder.org/) and [Google Colaboratory](https://colab.research.google.com/notebooks/intro.ipynb) for making this notebook interactive without you needing to download it or install [Jupyter](https://jupyter.org/) on your own device. Find more activities and license info at [CODINGinK12.org](http://www.codingink12.org).