# Introduction

In this tutorial, you'll learn how to use **[pandas](https://pandas.pydata.org)**, the most popular Python library for data analysis.

You'll start by learning how to create and manipulate data using pandas. Then, you'll explore how to work with existing data stored in files. Finally, you'll learn how to connect to a SQL database, retrieve data, and load it into a pandas DataFrame for analysis.

Let's get started!

# Using pandas

Pandas is a powerful library for working with tabular data. To use pandas, you'll typically start with the following line of code:

In [1]:
import pandas as pd
print("Pandas is ready to use!")

Pandas is ready to use!


# Creating Data

There are two main objects in pandas: **DataFrame** and **Series**.

### DataFrame
A DataFrame is a table with rows and columns. Each column can hold data of a specific type (e.g., numbers, strings).

Here’s an example of creating a simple DataFrame:

In [2]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


In this example:
- The column names are `Yes` and `No`.
- The values in each column are stored as lists.
- Each row represents a record in the table.

DataFrames can also hold text data. For example:

In [3]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland.


A Series can also have a name, which describes the data it contains:

# Reading Data from Files

Most of the time, you'll work with data that already exists in files. Pandas makes it easy to load data from common file formats like CSV.

Here’s an example of reading data from a CSV file:

In [6]:
wine_reviews = pd.read_csv("U:/Python Class/winemag-data-130k-v2.csv")
wine_reviews.head()  # Display the first few rows of the data

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


You can use the `shape` attribute to check the size of the DataFrame:

In [7]:
wine_reviews.shape

(129971, 14)

This tells you the number of rows and columns in the DataFrame.

# Connecting to a SQL Database

Sometimes, your data is stored in a database instead of a file. Pandas allows you to connect to SQL databases using the `pyodbc` library.

Let’s walk through how to connect to a SQL database and retrieve data.

### Steps:
1. Define the connection parameters (server name and database name).
2. Create a connection string.
3. Establish a connection using `pyodbc.connect()`.
4. Write a SQL query to retrieve data.
5. Use `pd.read_sql()` to execute the query and load the results into a pandas DataFrame.

In [8]:
import pyodbc

# Define your connection parameters
server = 'mss-p1-biss-01'
database = 'bissabcanalytics'

# Create a connection string
connection_string = (
    f'DRIVER={{SQL Server Native Client 11.0}};'
    f'SERVER={server};'
    f'DATABASE={database};'
    'Trusted_Connection=yes;'  # Use this for Windows Authentication
)

# Establish a connection to the database
connection = pyodbc.connect(connection_string)
print("Connection to SQL database established!")

Connection to SQL database established!


# Writing and Executing a SQL Query

Now that we have established a connection, we can write a SQL query to retrieve data from the database.

### Example Query:
Retrieve the first 1000 rows from the `dbo.wine_reviews` table.

In [None]:
# Write your SQL query
query = """
SELECT TOP 1000 *
FROM dbo.wine_reviews
"""

# Execute the query and load the data into a pandas DataFrame
wine_reviews = pd.read_sql(query, connection)

# Display the first few rows of the DataFrame
wine_reviews.head()

# Summary

In this tutorial, you learned how to:
- Use pandas for data analysis.
- Create and manipulate data using DataFrames and Series.
- Read data from CSV files.
- Connect to a SQL database using `pyodbc`.
- Write and execute a SQL query.
- Load the query results into a pandas DataFrame.

You can now explore and analyze your data using pandas!