# Introduction to Python: working with tabluar data
## workshop facilitator: Diane López, Information Specialist
### April 16, 2026

#### agenda: 
- introduction to pandas 
- quick start
- series 
- dataframe
- clean and moving 

<h3>What's Inside Pandas</h3>

| Pandas Dependency  | Required Version | Installed Version | Description |
|--------------------|------------------|-------------------|-------------|
| numpy              | >=1.23.2         | 2.2.4             | Supports large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. |
| python-dateutil    | >=2.8.2          | 2.9.0.post0       | Provides powerful extensions to the standard Python <code>datetime</code> module. |
| six                | >=1.5            | 1.17.0            | A Python 2 and 3 compatibility library. |
| pytz               | >=2020.1         | 2025.2            | Provides world timezone definitions, both modern and historical. |
| tzdata             | >=2022.7         | 2025.2            | A Python package containing zic-compiled binaries for timezone data used in timezone-aware datetime operations. |


<h3> Things to Know About Pandas </h3>

<p> To use Pandas, start by importing the package:</p>
<pre><code>import pandas as pd</code></pre>
<sub>(<code>pd</code> is a common alias for Pandas. It makes writing and calling Pandas functions easier—and from experience, it helps avoid misspellings!)</sub>

<p> Key concepts in Pandas:</p>
<ul>
    <li><strong>DataFrame:</strong> A table of data stored in a 2-dimensional structure, where rows and columns are labeled.</li>
    <li><strong>Series:</strong> Each column in a <code>DataFrame</code> is a <code>Series</code>, which is a one-dimensional array-like structure.</li>
</ul>

<p> You can manipulate and analyze data by applying methods to a <code>DataFrame</code> or a <code>Series</code>. 

<p> <em><strong>Run the line below if working in Jupyter NoteBook or Google CoLab to install the package</strong></em></p>
<p><em>just in case</em></p>

In [None]:
#$ pip install pandas #delete the # before the $ before running install

<h3>Let's Get Started — and Remember to Import Pandas!</h3>

In [24]:
import pandas as pd

<h2> Getting Started with Pandas DataFrame </h2>
<p>To manually store data in a table, create a <code>DataFrame</code>. When using a Python dictionary of lists, the dictionary keys become the column headers, and the values in each list represent the data for each column of the <code>DataFrame</code>.</p>

<h4> What is a DataFrame? </h4>

- A 2-dimensional data structure

    - Can store multiple data types:

        - Strings (text)
        - Integers
        - Floating point values
        - Categorical data

<p> Each <strong>column</strong> in a <code>DataFrame</code> is a <code>Series</code> </p>

<h4> Manually creating a <code>DataFrame</code> from scratch:</h4>

1. Name the DataFrame.

2. Define a dictionary where:

    - Each key is a column name.

    - Each value is a list of cell values for that column.

    - Use <code>{ }</code> curly braces to define a dictionary.

    - Use <code>[ ]</code> square brackets to define a list.


<h3>structure and syntax </h3>
<pre><code>
DataFrame_label = ({
    "column1": [
        "cell 1.1", 
        "cell 1.2", 
        "cell 1.3"
    ],
    "column2 label": [
        "cell value2.1", 
        "cell value2.2", 
        "cell value2.3"
    ],
    "column3 label": [
        "cell value3.1", 
        "cell value3.2", 
        "cell value3.3"
    ]
})

DataFrame_label
</code></pre>

<h3>Quick and Dirty Example</h3>

<p>In this example, I'm creating a <code>DataFrame</code> to keep track of attendee counts for the workshops we've hosted.</p>

<p>I start by naming the <code>DataFrame</code> as <code>pythonWs_df</code>.</p>

<p>Then, I call the function to create the <code>DataFrame</code> using <code>pd.DataFrame()</code>.</p>

<p>Inside the parentheses <code>()</code>, I begin creating the dictionary using curly braces <code>{}</code>.</p>

<p>For example: <code>({ dictionary })</code>. Next, I define the labels for the columns, like <code>"Workshops":</code></p>

<p>To add values for each <code>Series</code> (each column), I use square brackets <code>[]</code> to create a list of values.</p>

<p>Once that's done, I can run the cell to generate the table.</p>


In [None]:
pythonWs_df = pd.DataFrame (
    { # creating a dictionary with three list of Workshop, Date, and Attendees Numbers which are the column labels 
        "Workshop": [ # the values within the cells are contain in list 
            "Intro to Python Part A",
            "Intro to Python Part B",
            "Intro to Text Analysis Part A",
            "Intro to Text Analysis Part B",
            "Tabular Data with Python, Part I:",
            "Tabular Data with Python, Part II:",
        ],
        "Date": [
            "February 2",
            "February 26",
            "March 19", 
            "April 2",
            "April 16",
            "April 28",
        ],
        "Attendees Numbers": [
            15, 
            18, 
            22, 
            18,
            'NaN',
            'NaN',
        ],
    }
)

# Convert the "Attendees Numbers" column to numeric, coercing errors to NaN
pythonWs_df["Attendees Numbers"] = pd.to_numeric(pythonWs_df["Attendees Numbers"], errors='coerce')

pythonWs_df

<h4>How to Explore the DataFrame</h4>

<p>Take time to understand your data by using <code>.dtypes</code> to check the data types of each column in the <code>DataFrame</code>.</p>
<p>This helps you verify whether columns are stored as strings, integers, floats, or other types—so you can clean or transform them if needed.</p>



In [None]:
pythonWs_df.dtypes

<h5> More tools to explore the DataFrame</h5>
<ul> 
<li><code>.info(),</code></li>
<li><code>.describe(),</code></li>
<li><code>.head()</code></li>
</ul>

<h5>The <code>.info()</code> Function</h5>
<p>The <code>.info()</code> function provides a summary of the structure of a DataFrame.</p>

In [None]:
pythonWs_df.info()

<h5>The <code>.describe()</code> Function</h5>
<p>The <code>.describe()</code> function provides a descriptive statistical analysis of the DataFrame.</p>

In [None]:
pythonWs_df.describe()

<h5>The <code>.head()</code> Function</h5>
<p>The <code>.head()</code> function returns the first <em> n rows</em></p>
<p>The syntax: <code>DataFrame.head(n=5)</code> default is the first five rows.</p>
<p>But if you want the <strong>last n rows</strong>

In [None]:
pythonWs_df.head()

<h4> Working with Series aka Column values </h4> 

<p> To call a column use <code>the name of the DataFrame</code>, and use <code> ["name of column"]</code> square brackets. 

In [None]:
# Find the maximum value in the column
max_attendees = pythonWs_df["Attendees Numbers"].max()
max_attendees

<h2>How to read, clean, and write tabular data?</h2>

<p>In this section, I will be working with San Antonio 311 Service Calls which is publicly available data: https://data.sanantonio.gov/</p>

<ol>
    <li>loading data: <code>df = pd.read_csv('filename.csv')</code></li>
    <li>checking for missing data: <code>df.isnull().sum()</code></li>
    <li>filling or dropping missing values: <code>df.dropna(inplace=True)</code> <code>df.fillna(0, inplace=True)</code></li>
    <li>removing duplicates: <code>df.drop_duplicates(inplace=True)</code></li>
    <li>renaming columns: <code>df.rename(columns={'old_name': 'new_name'}, inplace=True)</code></li>
    <li>changing Data Types: <code>df['column'] = df['column'].astype(float)</code></li>
    <li>String operations: <code>df['column'] = df['column'].str.strip()</code> # remove whitespace, and <code>df['column'] = df['column'].str.lower()</code> # convert to lowercase</li>
    <li>filtering data: <code>df = df[df['column'] > 0]</code> # Keep only rows where the condition is true</li>
    <li>replacing values: <code>df['column'] = df['column'].replace({'old': 'new'})</code></li>
    <li>writing: <code>df.to_json('cleaned_data.json', orient='records', lines=True)</code></li>
</ol>


In [None]:
data_dictionary = pd.read_excel('datadictionary_311.xlsx', sheet_name='Sheet1', header=2)

data_dictionary[['Field Name','Description']].head(22)

Unnamed: 0,Field Name,Description
0,_id,"The number for the respective case on the spreadsheet, automatically assisgned by the export."
1,CATEGORY,"This general category was developed to place 311 services in a high level category, different than their respective department."
2,CASEID,The unique case reference number is assigned by the 311 Lagan customer relationship management system.
3,OPENEDDATETIME,The date and time that a case was submitted.
4,SLA_Date,"Every service request type has a due date assigned to the request, based on the request type name. The SLA Date is the due date and time for the request type based on the service level agreement (SLA). Each service request type has a timeframe in which it is scheduled to be addressed."
5,CLOSEDDATETIME,"The date and time that the case/request was was closed. If blank, the request has not been closed as of the Report Ending Date."
6,Late (Yes/No),This indicates whether the case has surpassed its Service Level Agreement due date for the specific service request.
7,Dept,The City department to whom the case is assigned.
8,REASONNAME,The department division within the City deaprtment to whom the case is assigned.
9,TYPENAME,"The service request type name for the issue being reported. Examples include stray animals, potholes, overgrown yards, junk vehicles, traffic signal malfunctions, etc."


<h4>

In [47]:
# loading data
sa_311 = pd.read_csv(r'SA_311_April_11.csv')

#checking for missing data
sa_311.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566498 entries, 0 to 566497
Data columns (total 18 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   _id                   566498 non-null  int64  
 1   Category              566498 non-null  object 
 2   CASEID                566498 non-null  int64  
 3   OPENEDDATETIME        566498 non-null  object 
 4   SLA_Date              566245 non-null  object 
 5   CLOSEDDATETIME        537325 non-null  object 
 6   Late (Yes/No)         566498 non-null  object 
 7   Dept                  566498 non-null  object 
 8   REASONNAME            566498 non-null  object 
 9   TYPENAME              566498 non-null  object 
 10  CaseStatus            566498 non-null  object 
 11  SourceID              566498 non-null  object 
 12  OBJECTDESC            566498 non-null  object 
 13  Council District      566498 non-null  int64  
 14  XCOORD                566469 non-null  float64
 15  

In [56]:
sa_311.tail(10)


Unnamed: 0,_id,Category,CASEID,OPENEDDATETIME,SLA_Date,CLOSEDDATETIME,Late (Yes/No),Dept,REASONNAME,TYPENAME,CaseStatus,SourceID,OBJECTDESC,Council District,XCOORD,YCOORD,Report Starting Date,Report Ending Date
566475,566476,Animals,1020249698,4/4/2025,6/3/2025,4/4/2025,NO,Animal Care Services,Field Operations,Animals(Stray Animal),Closed,Constituent Call,"202 N TAYMAN ST, SAN ANTONIO, 78226",5,2112922.0,13685722.0,4/5/2024,4/5/2025
566477,566478,Animals,1020249700,4/4/2025,4/14/2025,4/4/2025,NO,Animal Care Services,Field Operations,Animals(Public Nuisance),Closed,Constituent Call,"3010 COLONY DR, SAN ANTONIO, 78230",1,2113755.0,13743617.0,4/5/2024,4/5/2025
566479,566480,Animals,1020249702,4/4/2025,6/3/2025,4/4/2025,NO,Animal Care Services,Field Operations,Animals(Stray Animal),Closed,Constituent Call,CLAVER and MARTIN LUTHER KING DR,2,2147812.0,13698293.0,4/5/2024,4/5/2025
566480,566481,Traffic Signals and Signs,1020249703,4/4/2025,4/8/2025,4/4/2025,NO,Public Works,Traffic Operations,Traffic Signals (Maintenance_Emergency),Closed,Constituent Call,HUNT LN and MARBACH RD,6,2075741.0,13699623.0,4/5/2024,4/5/2025
566483,566484,Traffic Signals and Signs,1020249706,4/4/2025,4/8/2025,4/4/2025,NO,Public Works,Traffic Operations,Traffic Signals (Maintenance_Emergency),Closed,Constituent Call,HUNT LN and MARBACH RD,6,2075741.0,13699623.0,4/5/2024,4/5/2025
566490,566491,Animals,1020249713,4/4/2025,4/5/2025,4/5/2025,NO,Animal Care Services,Field Operations,Injured-Sick Animal,Closed,Constituent Call,"9802 PERRIN BEITEL, SAN ANTONIO, 78217",2,2155878.0,13740472.0,4/5/2024,4/5/2025
566492,566493,Animals,1020249715,4/4/2025,4/14/2025,4/4/2025,NO,Animal Care Services,Field Operations,Animals(Public Nuisance),Closed,311 Mobile App,"1315 W RUSSELL PLACE, SAN ANTONIO, 78201",1,2123626.0,13711953.0,4/5/2024,4/5/2025
566495,566496,Animals,1020249718,4/4/2025,4/5/2025,4/4/2025,NO,Animal Care Services,Field Operations,Animals(Aggressive Critical),Closed,311 Mobile App,"715 JOHN PAGE DR, SAN ANTONIO, 78228",7,2109697.0,13717792.0,4/5/2024,4/5/2025
566496,566497,Traffic Signals and Signs,1020249720,4/4/2025,4/8/2025,4/4/2025,NO,Public Works,Traffic Operations,Traffic Signals (Maintenance_Emergency),Closed,Constituent Call,HUNT LN and TRES CAMINOS,4,2073929.0,13705031.0,4/5/2024,4/5/2025
566497,566498,Traffic Signals and Signs,1020249722,4/4/2025,4/8/2025,4/5/2025,NO,Public Works,Traffic Operations,Traffic Signals (Maintenance_Emergency),Closed,311 Mobile App,BROADWAY and E GRAYSON ST,1,2135085.0,13709179.0,4/5/2024,4/5/2025


In [49]:
sa_311.describe(include='all')

Unnamed: 0,_id,Category,CASEID,OPENEDDATETIME,SLA_Date,CLOSEDDATETIME,Late (Yes/No),Dept,REASONNAME,TYPENAME,CaseStatus,SourceID,OBJECTDESC,Council District,XCOORD,YCOORD,Report Starting Date,Report Ending Date
count,566498.0,566498,566498.0,566498,566245,537325,566498,566498,566498,566498,566498,566498,566498,566498.0,566469.0,566469.0,566498,566498
unique,,12,,1957,2271,369,2,9,24,253,2,3,220225,,,,1,1
top,,Solid Waste Services,,7/9/2024,1/7/2025,5/15/2024,NO,Solid Waste Management,Code Enforcement,No Pickup,Closed,Constituent Call,"10362 SAHARA, SAN ANTONIO, 78216",,,,4/5/2024,4/5/2025
freq,,183038,,2983,3140,3574,458836,184693,173154,50431,537327,426034,1084,,,,566498,566498
mean,283249.5,,1019816000.0,,,,,,,,,,,4.597547,2121002.0,13711330.0,,
std,163534.030735,,503167.6,,,,,,,,,,,2.748,26329.86,27389.62,,
min,1.0,,1014569000.0,,,,,,,,,,,0.0,2030256.0,13601200.0,,
25%,141625.25,,1019699000.0,,,,,,,,,,,2.0,2104373.0,13691850.0,,
50%,283249.5,,1019903000.0,,,,,,,,,,,4.0,2121378.0,13707910.0,,
75%,424873.75,,1020082000.0,,,,,,,,,,,7.0,2138968.0,13730140.0,,


In [50]:
# dropping missing values
sa_311.dropna(inplace=True)



In [51]:
sa_311.describe(include='all')

Unnamed: 0,_id,Category,CASEID,OPENEDDATETIME,SLA_Date,CLOSEDDATETIME,Late (Yes/No),Dept,REASONNAME,TYPENAME,CaseStatus,SourceID,OBJECTDESC,Council District,XCOORD,YCOORD,Report Starting Date,Report Ending Date
count,537046.0,537046,537046.0,537046,537046,537046,537046,537046,537046,537046,537046,537046,537046,537046.0,537046.0,537046.0,537046,537046
unique,,12,,1646,1951,369,2,9,23,242,1,3,214152,,,,1,1
top,,Solid Waste Services,,7/9/2024,1/7/2025,5/15/2024,NO,Solid Waste Management,Code Enforcement,No Pickup,Closed,Constituent Call,"Woodlawn Lake Park, 1103 CINCINNATI AVE, SAN ANTONIO, 78201",,,,4/5/2024,4/5/2025
freq,,180593,,2964,3109,3573,446853,182239,163678,50078,537046,405114,856,,,,537046,537046
mean,281558.991099,,1019846000.0,,,,,,,,,,,4.611476,2120938.0,13711440.0,,
std,158572.065689,,370495.3,,,,,,,,,,,2.753012,26366.55,27445.7,,
min,22.0,,1015502000.0,,,,,,,,,,,0.0,2030256.0,13601200.0,,
25%,144669.25,,1019704000.0,,,,,,,,,,,2.0,2104256.0,13691920.0,,
50%,280769.5,,1019899000.0,,,,,,,,,,,4.0,2121304.0,13708110.0,,
75%,417501.75,,1020074000.0,,,,,,,,,,,7.0,2138888.0,13730230.0,,
