Skip to content

Latest commit

 

History

History
125 lines (99 loc) · 4.79 KB

python-dataframe-pandas.md

File metadata and controls

125 lines (99 loc) · 4.79 KB
title titleSuffix description author ms.author ms.date ms.service ms.subservice ms.topic monikerRange
Insert data from a SQL table into a Python pandas dataframe
SQL machine learning
Learn how to read data from a SQL table and insert into a pandas dataframe using Python.
WilliamDAssafMSFT
wiassaf
07/23/2020
sql
machine-learning
how-to
>=sql-server-2017||>=sql-server-linux-ver15||=azuresqldb-mi-current||=azuresqldb-current

Insert data from a SQL table into a Python pandas dataframe

[!INCLUDESQL Server SQL DB SQL MI]

This article describes how to insert SQL data into a pandas dataframe using the pyodbc package in Python. The rows and columns of data contained within the dataframe can be used for further data exploration.

Prerequisites

::: moniker range=">=sql-server-2017||>=sql-server-linux-ver15"

::: moniker range="=azuresqldb-current"

::: moniker range="=azuresqldb-mi-current"

Verify restored database

You can verify that the restored database exists by querying the Person.CountryRegion table:

USE AdventureWorks;
SELECT * FROM Person.CountryRegion;

Install Python packages

Download and Install Azure Data Studio.

Install the following Python packages:

  • pyodbc
  • pandas

To install these packages:

  1. In your Azure Data Studio notebook, select Manage Packages.
  2. In the Manage Packages pane, select the Add new tab.
  3. For each of the following packages, enter the package name, click Search, then click Install.

Insert data

Use the following script to select data from Person.CountryRegion table and insert into a dataframe. Edit the connection string variables: 'server', 'database', 'username', and 'password' to connect to SQL.

To create a new notebook:

  1. In Azure Data Studio, select File, select New Notebook.
  2. In the notebook, select kernel Python3, select the +code.
  3. Paste code in notebook, select Run All.
import pyodbc
import pandas as pd
# Some other example server values are
# server = 'localhost\sqlexpress' # for a named instance
# server = 'myserver,port' # to specify an alternate port
server = 'servername' 
database = 'AdventureWorks' 
username = 'yourusername' 
password = 'databasename'  
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
# select 26 rows from SQL table to insert in dataframe.
query = "SELECT [CountryRegionCode], [Name] FROM Person.CountryRegion;"
df = pd.read_sql(query, cnxn)
print(df.head(26))

Output

The print command in the preceding script displays the rows of data from the pandas dataframe df.

CountryRegionCode                 Name
0                 AF          Afghanistan
1                 AL              Albania
2                 DZ              Algeria
3                 AS       American Samoa
4                 AD              Andorra
5                 AO               Angola
6                 AI             Anguilla
7                 AQ           Antarctica
8                 AG  Antigua and Barbuda
9                 AR            Argentina
10                AM              Armenia
11                AW                Aruba
12                AU            Australia
13                AT              Austria
14                AZ           Azerbaijan
15                BS         Bahamas, The
16                BH              Bahrain
17                BD           Bangladesh
18                BB             Barbados
19                BY              Belarus
20                BE              Belgium
21                BZ               Belize
22                BJ                Benin
23                BM              Bermuda
24                BT               Bhutan
25                BO              Bolivia

Next steps