# Introduction to Python: working with tabluar data
## workshop facilitator: Diane López, Information Specialist
### April 16, 2026

#### agenda: 
- introduction to pandas 
- quick start
- series 
- dataframe
- clean and moving 

<h3>What's Inside Pandas</h3>

| Pandas Dependency  | Required Version | Installed Version | Description |
|--------------------|------------------|-------------------|-------------|
| numpy              | >=1.23.2         | 2.2.4             | Supports large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. |
| python-dateutil    | >=2.8.2          | 2.9.0.post0       | Provides powerful extensions to the standard Python <code>datetime</code> module. |
| six                | >=1.5            | 1.17.0            | A Python 2 and 3 compatibility library. |
| pytz               | >=2020.1         | 2025.2            | Provides world timezone definitions, both modern and historical. |
| tzdata             | >=2022.7         | 2025.2            | A Python package containing zic-compiled binaries for timezone data used in timezone-aware datetime operations. |


<h3> Things to Know About Pandas </h3>

<p> To use Pandas, start by importing the package:</p>
<pre><code>import pandas as pd</code></pre>
<sub>(<code>pd</code> is a common alias for Pandas. It makes writing and calling Pandas functions easier—and from experience, it helps avoid misspellings!)</sub>

<p> Key concepts in Pandas:</p>
<ul>
    <li><strong>DataFrame:</strong> A table of data stored in a 2-dimensional structure, where rows and columns are labeled.</li>
    <li><strong>Series:</strong> Each column in a <code>DataFrame</code> is a <code>Series</code>, which is a one-dimensional array-like structure.</li>
</ul>

<p> You can manipulate and analyze data by applying methods to a <code>DataFrame</code> or a <code>Series</code>. 

<p> <em><strong>Run the line below if working in Jupyter NoteBook or Google CoLab to install the package</strong></em></p>
<p><em>just in case</em></p>

In [12]:
#$ pip install pandas #delete the # before the $ before running install

<h3>Let's Get Started — and Remember to Import Pandas!</h3>

In [13]:
import pandas as pd

<h2> Getting Started with Pandas DataFrame </h2>
<p>To manually store data in a table, create a <code>DataFrame</code>. When using a Python dictionary of lists, the dictionary keys become the column headers, and the values in each list represent the data for each column of the <code>DataFrame</code>.</p>

<h4> What is a DataFrame? </h4>

- A 2-dimensional data structure

    - Can store multiple data types:

        - Strings (text)
        - Integers
        - Floating point values
        - Categorical data

<p> Each <strong>column</strong> in a <code>DataFrame</code> is a <code>Series</code> </p>

<h4> Manually creating a <code>DataFrame</code> from scratch:</h4>

1. Name the DataFrame.

2. Define a dictionary where:

    - Each key is a column name.

    - Each value is a list of cell values for that column.

    - Use <code>{ }</code> curly braces to define a dictionary.

    - Use <code>[ ]</code> square brackets to define a list.


<h3>structure and syntax </h3>
<pre><code>
DataFrame_label = ({
    "column1": [
        "cell 1.1", 
        "cell 1.2", 
        "cell 1.3"
    ],
    "column2 label": [
        "cell value2.1", 
        "cell value2.2", 
        "cell value2.3"
    ],
    "column3 label": [
        "cell value3.1", 
        "cell value3.2", 
        "cell value3.3"
    ]
})

DataFrame_label
</code></pre>

<h3>Quick and Dirty Example</h3>

<p>In this example, I'm creating a <code>DataFrame</code> to keep track of attendee counts for the workshops we've hosted.</p>

<p>I start by naming the <code>DataFrame</code> as <code>pythonWs_df</code>.</p>

<p>Then, I call the function to create the <code>DataFrame</code> using <code>pd.DataFrame()</code>.</p>

<p>Inside the parentheses <code>()</code>, I begin creating the dictionary using curly braces <code>{}</code>.</p>

<p>For example: <code>({ dictionary })</code>. Next, I define the labels for the columns, like <code>"Workshops":</code></p>

<p>To add values for each <code>Series</code> (each column), I use square brackets <code>[]</code> to create a list of values.</p>

<p>Once that's done, I can run the cell to generate the table.</p>


In [14]:
pythonWs_df = pd.DataFrame (
    { # creating a dictionary with three list of Workshop, Date, and Attendees Numbers which are the column labels 
        "Workshop": [ # the values within the cells are contain in list 
            "Intro to Python Part A",
            "Intro to Python Part B",
            "Intro to Text Analysis Part A",
            "Intro to Text Analysis Part B",
            "Tabular Data with Python, Part I:",
            "Tabular Data with Python, Part II:",
        ],
        "Date": [
            "February 2",
            "February 26",
            "March 19", 
            "April 2",
            "April 16",
            "April 28",
        ],
        "Attendees Numbers": [
            15, 
            18, 
            22, 
            18,
            'NaN',
            'NaN',
        ],
    }
)

# Convert the "Attendees Numbers" column to numeric, coercing errors to NaN
pythonWs_df["Attendees Numbers"] = pd.to_numeric(pythonWs_df["Attendees Numbers"], errors='coerce')

pythonWs_df

Unnamed: 0,Workshop,Date,Attendees Numbers
0,Intro to Python Part A,February 2,15.0
1,Intro to Python Part B,February 26,18.0
2,Intro to Text Analysis Part A,March 19,22.0
3,Intro to Text Analysis Part B,April 2,18.0
4,"Tabular Data with Python, Part I:",April 16,
5,"Tabular Data with Python, Part II:",April 28,


<h4>How to Explore the DataFrame</h4>

<p>Take time to understand your data by using <code>.dtypes</code> to check the data types of each column in the <code>DataFrame</code>.</p>
<p>This helps you verify whether columns are stored as strings, integers, floats, or other types—so you can clean or transform them if needed.</p>



In [15]:
pythonWs_df.dtypes

Workshop              object
Date                  object
Attendees Numbers    float64
dtype: object

<h5> More tools to explore the DataFrame</h5>
<ul> 
<li><code>.info(),</code></li>
<li><code>.describe(),</code></li>
<li><code>.head()</code></li>
</ul>

<h5>The <code>.info()</code> Function</h5>
<p>The <code>.info()</code> function provides a summary of the structure of a DataFrame.</p>

In [16]:
pythonWs_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Workshop           6 non-null      object 
 1   Date               6 non-null      object 
 2   Attendees Numbers  4 non-null      float64
dtypes: float64(1), object(2)
memory usage: 276.0+ bytes


<h5>The <code>.describe()</code> Function</h5>
<p>The <code>.describe()</code> function provides a descriptive statistical analysis of the DataFrame.</p>

In [17]:
pythonWs_df.describe()

Unnamed: 0,Attendees Numbers
count,4.0
mean,18.25
std,2.872281
min,15.0
25%,17.25
50%,18.0
75%,19.0
max,22.0


<h5>The <code>.head()</code> Function</h5>
<p>The <code>.head()</code> function returns the first <em> n rows</em></p>
<p>The syntax: <code>DataFrame.head(n=5)</code> default is the first five rows.</p>
<p>But if you want the <strong>last n rows</strong>

In [18]:
pythonWs_df.head()

Unnamed: 0,Workshop,Date,Attendees Numbers
0,Intro to Python Part A,February 2,15.0
1,Intro to Python Part B,February 26,18.0
2,Intro to Text Analysis Part A,March 19,22.0
3,Intro to Text Analysis Part B,April 2,18.0
4,"Tabular Data with Python, Part I:",April 16,


<h4> Working with Series aka Column values </h4> 

<p> To call a column use <code>the name of the DataFrame</code>, and use <code> ["name of column"]</code> square brackets. 

In [19]:
# Find the maximum value in the column
max_attendees = pythonWs_df["Attendees Numbers"].max()
max_attendees

np.float64(22.0)

<h2> How to read and write tabular data? </h2>

<h4>