## The Web Scraping Recipe

To scrape information from the web is:
1. **MAPPING**: Finding URLs of the pages containing the information you want.
2. **DOWNLOAD**: Fetching the pages via HTTP.
3. **PARSE**: Extracting the information from HTML.  
  
  
You could also add `connection`, `storing`, `logging`, etc.
   


### Packages used
* for connecting to the internet we use: **requests**
* for parsing: **beautifulsoup** and **regex**
* for automatic browsing / screen scraping: **selenium** 
* for mitigating errors we use: **time**

We will write our scrapers with basic python, for larger projects consider looking into the packages **scrapy**

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import yfinance as yf
import os

In [3]:
import requests
from bs4 import BeautifulSoup
import re
import selenium
import time

import tqdm
import json

#### 解析方法1
通过url和`pd.read_html(url)` (Read HTML tables into a list of DataFrame objects) 获得表

In [12]:
# NBA网站
url = 'https://www.basketball-reference.com/leagues/NBA_2018.html' # link to the website
dfs = pd.read_html(url) # parses all tables found on the page. 这个简直mapping, download, parse三合一
dfs[1] # 列出第2个表

Unnamed: 0,Western Conference,W,L,W/L%,GB,PS/G,PA/G,SRS
0,Houston Rockets*,65,17,0.793,—,112.4,103.9,8.21
1,Golden State Warriors*,58,24,0.707,7.0,113.5,107.5,5.79
2,Portland Trail Blazers*,49,33,0.598,16.0,105.6,103.0,2.6
3,Oklahoma City Thunder*,48,34,0.585,17.0,107.9,104.4,3.42
4,Utah Jazz*,48,34,0.585,17.0,104.1,99.8,4.47
5,New Orleans Pelicans*,48,34,0.585,17.0,111.7,110.4,1.48
6,San Antonio Spurs*,47,35,0.573,18.0,102.7,99.8,2.89
7,Minnesota Timberwolves*,47,35,0.573,18.0,109.5,107.3,2.35
8,Denver Nuggets,46,36,0.561,19.0,110.0,108.5,1.57
9,Los Angeles Clippers,42,40,0.512,23.0,109.0,109.0,0.15


#### 解析方法2
url->`requests.get(url)`->`BeautifulSoup(.content,'html.parser')`->`.find_all('h2')[0]`获得二级标题h2的第一个

In [13]:
url = 'https://www.basketball-reference.com/leagues/NBA_2018.html'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')  #parse
soup.find_all('h2')[0].text  #identify

'Conference Standings'

#### 操作实例1
<p>目标网站 www.jobnet.dk</p>
找出前100条工作信息并制表，统计出最多的10个工作区域

In [18]:
# 定义一个记录文件的函数
def log(response,logfile,output_path=os.getcwd()):
    # Open or create the csv file
    if os.path.isfile(logfile): #If the log file exists, open it and allow for changes     
        log = open(logfile,'a')
    else: #If the log file does not exist, create it and make headers for the log variables
        log = open(logfile,'w')
        header = ['timestamp','status_code','length','output_file']
        log.write(';'.join(header) + "\n") #Make the headers and jump to new line
        
    # Gather log information
    status_code = response.status_code #Status code from the request result
    timestamp = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())) #Local time
    length = len(response.text) #Length of the HTML-string
    
    # Open the log file and append the gathered log information
    with open(logfile,'a') as log:
        log.write(f'{timestamp};{status_code};{length};{output_path}' + "\n") #Append the information and jump to new line

In [19]:
links = []
for offset in range(0,5*20,20): # offset间隔20是找出的翻页url规律，因为要前100条工作信息，一页20条，所以是5*20
    url = f'https://job.jobnet.dk/CV/FindWork/Search?offset={offset}'   # 点击需要的信息页面，然后找到它的Request URL
    links.append(url)
    
logfile = 'log3.csv' # 记录文件
list_htmls = []
jobs_first100 = pd.DataFrame()

for url in tqdm.tqdm(links):
    try:
        response = requests.get(url, headers={'name':'Siyi','email':'wasariii@outlook.com'})
    except Exception as e: # 发生错误的情况
        print(url) #Print url
        print(e) #Print error
        jobs_first100.to_csv('jobs_first100.csv') #Save the dataframe as a csv file to retrieve at another time
        continue #Continue to next iteration of the loop
    
    if response.ok: #Check if the response carries any data
        result_json = response.json() #If the response carries data, then convert it to json format
    else: #If the response does not carry any data, then print the status_code and continue to next iteration of the loop
        print(response.status_code)
        continue
    
    result_df = pd.DataFrame(result_json['JobPositionPostings']) # 网页Network-Search-Preview下
    jobs_first100 = pd.concat([jobs_first100,result_df], axis=0, ignore_index=True) #Append to the rest of the data
    log(response, logfile)
    time.sleep(0.5) #Sleep for 0.5 seconds 时间控制
jobs_first100

100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:04<00:00,  1.09it/s]


Unnamed: 0,AutomatchType,Abroad,Weight,Title,JobHeadline,Presentation,HiringOrgName,WorkPlaceAddress,WorkPlacePostalCode,WorkPlaceCity,...,HiringOrgCVR,UserLoggedIn,AnonymousEmployer,ShareUrl,DetailsUrl,JobLogUrl,HasLocationValues,ID,Latitude,Longitude
0,0,False,1.0,Er du vores nye kollega?,Er du vores nye kollega?,Søegaard Vandteknik søger en stabil og service...,SØEGAARD VANDTEKNIK ApS,Navervej 20,4000,Roskilde,...,32284213,False,False,https://job.jobnet.dk/CV/FindWork/DetailsSocia...,https://job.jobnet.dk/CV/FindWork/Details/5653474,https://job.jobnet.dk/CV/FindWork/Details/5653474,True,5653474,55.6390,12.1198
1,0,False,1.0,Vi har brug for dig til vores produktion,Vi har brug for dig til vores produktion,Søegaard Vandteknik søger stabil og robust pro...,SØEGAARD VANDTEKNIK ApS,Biltris Gade 25,4070,Kirke Hyllinge,...,32284213,False,False,https://job.jobnet.dk/CV/FindWork/DetailsSocia...,https://job.jobnet.dk/CV/FindWork/Details/5653469,https://job.jobnet.dk/CV/FindWork/Details/5653469,True,5653469,55.7243,11.9323
2,0,False,1.0,Regnskabschef,Regnskabschef,"FULLHOUSE A/S søger en erfaren regnskabschef, ...",FULL HOUSE A/S,Sydholmen 1,2650,Hvidovre,...,16444286,False,False,https://job.jobnet.dk/CV/FindWork/DetailsSocia...,https://job.jobnet.dk/CV/FindWork/Details/5651028,https://job.jobnet.dk/CV/FindWork/Details/5651028,True,5651028,55.6084,12.4692
3,0,False,1.0,Pædagog/PAU til bocenter for borgere med særli...,Pædagog/PAU til bocenter for borgere med særli...,Kan du danse polka?Hør lige her hvad en medarb...,Hillerød Kommune,Nødebovej,3480,Fredensborg,...,29189366,False,False,https://job.jobnet.dk/CV/FindWork/DetailsSocia...,https://job.jobnet.dk/CV/FindWork/Details/5653471,https://job.jobnet.dk/CV/FindWork/Details/5653471,True,5653471,55.9709,12.4020
4,0,False,1.0,Kok,Kok,Kok søges til Storebælt Sinatur Hotel & Konfer...,Storebælt Sinatur Hotel & Konference,Østerøvej 121,5800,Nyborg,...,55602816,False,False,https://job.jobnet.dk/CV/FindWork/DetailsSocia...,https://job.jobnet.dk/CV/FindWork/Details/5653470,https://job.jobnet.dk/CV/FindWork/Details/5653470,True,5653470,55.3102,10.8183
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0,False,1.0,Svendborg Søfartsskole søger controller,Svendborg Søfartsskole søger controller,Svendborg Søfartsskole\nsøger\nController\n \n...,SVENDBORG SØFARTSSKOLE,Overgade 6,5700,Svendborg,...,64305018,False,False,https://job.jobnet.dk/CV/FindWork/DetailsSocia...,https://job.jobnet.dk/CV/FindWork/Details/5653309,https://job.jobnet.dk/CV/FindWork/Details/5653309,True,5653309,55.0144,10.5980
96,0,False,1.0,Medarbejder til Ungdomsmodtagelse Bornholm,Medarbejder til Ungdomsmodtagelse Bornholm,Brænder du for at arbejde med unge menneskers ...,Bornholms Regionskommune,Ullasvej 23,3700,Rønne,...,26696348,False,False,https://job.jobnet.dk/CV/FindWork/DetailsSocia...,https://job.jobnet.dk/CV/FindWork/Details/5653315,https://job.jobnet.dk/CV/FindWork/Details/5653315,True,5653315,55.0925,14.7087
97,0,False,1.0,Marinekonstabel til Søværnets Tamburkorps – in...,Marinekonstabel til Søværnets Tamburkorps – in...,"Har du musik i blodet, og kunne du tænke dig a...",Marinestation København,Henrik Gerners Plads 1,1439,København K,...,16287180,False,False,https://job.jobnet.dk/CV/FindWork/DetailsSocia...,https://job.jobnet.dk/CV/FindWork/Details/5653314,https://job.jobnet.dk/CV/FindWork/Details/5653314,True,5653314,55.6847,12.6076
98,0,False,1.0,Dansk Isoleringsteknik i Ans søger Isolatør,Dansk Isoleringsteknik i Ans søger Isolatør,\n \nDansk Isoleringsteknik i Ans søger en fa...,Dansk Isoleringsteknik ApS,Ansvej 51,8643,Ans By,...,42440280,False,False,https://job.jobnet.dk/CV/FindWork/DetailsSocia...,https://job.jobnet.dk/CV/FindWork/Details/5620156,https://job.jobnet.dk/CV/FindWork/Details/5620156,True,5620156,56.3124,9.5485


In [20]:
# 统计最多的10个OccupationArea，降序
jobs_first100.groupby(jobs_first100['OccupationArea'])['ID'].count().sort_values(ascending=False)[0:10]

OccupationArea
Akademisk arbejde                             16
Pædagogisk, socialt og kirkeligt arbejde      13
Hotel, restauration, køkken, kantine          10
Sundhed, omsorg og personlig pleje            10
Undervisning og vejledning                     7
Rengøring, ejendomsservice og renovation       6
Industriel produktion                          6
Kontor, administration, regnskab og finans     6
Bygge og anlæg                                 6
Salg, indkøb og markedsføring                  6
Name: ID, dtype: int64

#### 操作实例2
<p>目标网站 www.dr.dk/nyheder/udland</p>
列出前10条文章的title, lead, time

In [4]:
url = 'https://www.dr.dk/nyheder/udland' 
response = requests.get(url, headers={'name':'Siyi','email':'wasariii@outlook.com'})
soup = BeautifulSoup(response.content,'lxml')

articles = soup.find_all('div', class_ = 'dre-teaser-content') #获取article信息 network panel->Elements
#(class_ is used because class is reserved in Python) 

In [5]:
list_of_article_urls = []
# Creating a loop that appends the article url to the list above
for i in range(len(articles)):
    list_of_article_urls.append(articles[i].find('a')['href'])  # 找出所有文章的url


list_of_article_urls_final = []
for link in list_of_article_urls:
    if '/nyheder/udland' in link: #All article URLs have this string in them, so we restrict on it being in the URL
        list_of_article_urls_final.append(link)
print(list_of_article_urls_final)

['/nyheder/udland/ghana-vil-ulovliggoere-homoseksualitet', '/nyheder/udland/amerikansk-radiovaert-skal-betale-millionerstatning-loegne-om-skoleskyderi', '/nyheder/udland/ukrainsk-militaer-bryder-krigsregler-foerer-krig-fra-private-hjem-skoler-og', '/nyheder/udland/knap-var-flyet-landet-i-taiwan-foer-en-storpolitisk-konflikt-var-i-lys-lue-her-er', '/nyheder/udland/drs-matilde-kimer-er-blevet-udvist-af-rusland', '/nyheder/udland/foerste-amerikanske-delstat-har-stemt-nej-til-fjerne-retten-til-fri-abort', '/nyheder/udland/pelosi-roser-taiwans-demokrati-mens-kina-sender-kampfly-paa-vingerne', '/nyheder/udland/taiwan-byder-pelosi-velkommen-med-aabne-arme-mens-kina-skruer-op-trusler', '/nyheder/udland/al-qaeda-leder-blev-draebt-i-diplomatkvarter-fem-minutters-gang-fra-tidligere-dansk', '/nyheder/udland/nancy-pelosi-trodser-kinesiske-advarsler-ankommer-til-taiwan-til-historisk-besoeg', '/nyheder/udland/corona-laeges-selvmord-saetter-gang-i-oestrigsk-debat-om-netchikane', '/nyheder/udland/puk-d

In [23]:
title_list = []
lead_list = []
time_list = []

for i in range(10): #len(list_of_article_urls)
    
    # This time we scrape for each news article in the url list we created before
    url = 'https://www.dr.dk' + list_of_article_urls_final[i] #The scraped links are relative, so we need to add the base url 所有文章的url
    response = requests.get(url, headers={'name':'Siyi','email':'wasariii@outlook.com'})
    soup = BeautifulSoup(response.content,'lxml')
    
    # Append title to list
    temp = soup.find_all('h1')
    temp = temp[1]
    temp = temp.text.strip()
    title_list.append(temp)
    
    # Append lead to list
    temp = soup.find('p', class_='dre-article-title__summary')
    temp = temp.text.strip()
    lead_list.append(temp)

    # Append time posted to list
    temp = soup.find('time', class_='dre-byline__date')
    temp = temp['datetime']
    time_list.append(temp)

In [24]:
df = pd.DataFrame({'title':title_list, 'lead':lead_list, 'time':time_list})
df

Unnamed: 0,title,lead,time
0,Ghana vil straffe homoseksualitet med fængsel ...,Afrika-korrespondent kalder det et 'tilbagesla...,2022-08-05T11:58:00+00:00
1,Amerikansk radiovært skal betale millionerstat...,"Alex Jones har i årevis påstået, at massakren ...",2022-08-05T03:55:00+00:00
2,Amnesty: Ukrainsk militær bryder krigsregler -...,Zelenskyj kritiserer Amnesty-rapport for at væ...,2022-08-04T11:48:00+00:00
3,"Knap var flyet landet i Taiwan, før en storpol...",Konflikten om ø-staten handler om årtiers poli...,2022-08-03T18:50:00+00:00
4,DR's Matilde Kimer er blevet udvist af Rusland,Rusland slår hårdt ned på uafhængige medier og...,2022-08-03T16:29:00+00:00
5,Første amerikanske delstat har stemt 'nej' til...,Resultatet er en vigtigt sejr for tilhængere a...,2022-08-03T08:56:00+00:00
6,"Pelosi roser Taiwans demokrati, mens Kina send...",Den amerikanske toppolitikers uanmeldte besøg ...,2022-08-03T03:57:00+00:00
7,"Taiwan byder Pelosi velkommen med åbne arme, m...","USA skal støtte demokrati alle steder, skriver...",2022-08-02T18:53:00+00:00
8,Al-Qaeda-leder blev dræbt i diplomatkvarter - ...,Terrorleders tilstedeværelse midt i hovedstade...,2022-08-02T16:14:00+00:00
9,Nancy Pelosi trodser kinesiske advarsler: Anko...,Formanden for Repræsentanternes Hus er ankomme...,2022-08-02T14:52:00+00:00


In [6]:
# what if we need the body?
url = 'https://www.dr.dk/nyheder/udland/gazprom-strammer-ifoelge-tyskland-skruen-uden-grund' 
response = requests.get(url, headers={'name':'Siyi','email':'wasariii@outlook.com'})
soup = BeautifulSoup(response.content,'lxml')
body = soup.find('div', class_ = 'dre-article-body')

'''
This body consists of both sections with text and figures. We want it all.
But sections and figures have different tags, so we cannot just use find_all to find all elements in the body.
Instead we can use .children. It finds all children of the element body:
'''

body_text = []
for child in body.children:
    body_text.append(child.text)
print(body_text)

['Gazprom halverer gasleverancerne til Europa via Nord Stream 1. Årsagen er ifølge selskabet vedligehold af en gasturbine. Den daglige gasforsyning via gasledningen vil fra onsdag morgen blive reduceret til 33 millioner kubikmeter, oplyser Gazprom.Det svarer til cirka 20 procent af den maksimale kapacitet, og det fremgår ikke, hvor længe den yderligt reducerede forsyning af gas vil stå på.', '', 'Den tyske regering anser den forklaringen om vedligeholdelse for at være opfundet til lejligheden.- Ifølge vores oplysninger er der ingen teknisk grund til en reduktion i leverancerne, siger en talskvinde for Finansministeriet og minister Robert Habeck til Frankfurter Allgemeine Zeitung.Tyskerne får 25 procent af deres energi fra gas, hvor en overvejende del er kommet fra Rusland.Gasprisen stiger med 10 procentDet er anden gang indenfor en uge, at Gazprom reducerer leverancen af gas under påskud af reperation af gasturbiner. Da Gazprom efter ti dages vedligehold i sidste uge genåbnede for gasf

In [26]:
'''
We have used .text to get the text of the HTML. The figure elements do not contain any text, so they will just be empty.
We can use .join() to join all the strings in the list. Just join it on an empty string:
# 这步似乎是只去掉了''
'''

''.join(body_text)

'Gazprom halverer gasleverancerne til Europa via Nord Stream 1. Årsagen er ifølge selskabet vedligehold af en gasturbine. Den daglige gasforsyning via gasledningen vil fra onsdag morgen blive reduceret til 33 millioner kubikmeter, oplyser Gazprom.Det svarer til cirka 20 procent af den maksimale kapacitet, og det fremgår ikke, hvor længe den yderligt reducerede forsyning af gas vil stå på.Den tyske regering anser den forklaringen om vedligeholdelse for at være opfundet til lejligheden.- Ifølge vores oplysninger er der ingen teknisk grund til en reduktion i leverancerne, siger en talskvinde for Finansministeriet og minister Robert Habeck til Frankfurter Allgemeine Zeitung.Tyskerne får 25 procent af deres energi fra gas, hvor en overvejende del er kommet fra Rusland.Gasprisen stiger med 10 procentDet er anden gang indenfor en uge, at Gazprom reducerer leverancen af gas under påskud af reperation af gasturbiner. Da Gazprom efter ti dages vedligehold i sidste uge genåbnede for gasforsyninge

#### 操作实例3
<p>目标网站 https://www.basketball-reference.com/leagues/NBA_2018.html</p>
列出页面上全部的表

In [7]:
url = 'https://www.basketball-reference.com/leagues/NBA_2018.html' 
response = requests.get(url, headers={'name':'Siyi','email':'wasariii@outlook.com'})
soup = BeautifulSoup(response.content,'lxml')
table_node = soup.find('div', class_ = 'table_wrapper')  # 仅一个表

print(table_node)

<div class="table_wrapper" id="all_confs_standings_E">
<div class="section_heading assoc_confs_standings_E" id="confs_standings_E_sh">
<span class="section_anchor" data-label="Conference Standings" id="confs_standings_E_link"></span><h2>Conference Standings</h2> <div class="section_heading_text">
<ul><li><small>* Playoff teams</small></li>
</ul>
</div>
</div>
<div class="table_container" id="div_confs_standings_E">
<table class="suppress_all sortable stats_table" data-cols-to-freeze=",1" id="confs_standings_E">
<caption>Conference Standings Table</caption>
<colgroup><col/><col/><col/><col/><col/><col/><col/><col/></colgroup>
<thead>
<tr>
<th aria-label="Eastern Conference" class="poptip sort_default_asc left" data-stat="team_name" scope="col">Eastern Conference</th>
<th aria-label="Wins" class="poptip right" data-stat="wins" data-tip="Wins" scope="col">W</th>
<th aria-label="Losses" class="poptip right" data-stat="losses" data-tip="Losses" scope="col">L</th>
<th aria-label="Win-Loss Pe

In [12]:
# 定义函数从上面的结果中制作Data Frame
def parse_html_table(table_node):
    # Get the columns in a list
    columns_html = table_node.thead.find_all('th')
    # Extract the text
    columns = [col.text for col in columns_html]

    rows_list = table_node.tbody.find_all('tr')

    data = []
    for row_node in rows_list:
        row = []
        for child in row_node.children:  # 不仅仅找出文字（？）
            row.append(child.text)
        data.append(row)
    df = pd.DataFrame(data,columns=columns)
    return df
df = parse_html_table(table_node)
df

Unnamed: 0,Eastern Conference,W,L,W/L%,GB,PS/G,PA/G,SRS
0,Toronto Raptors*,59,23,0.72,—,111.7,103.9,7.29
1,Boston Celtics*,55,27,0.671,4.0,104.0,100.4,3.23
2,Philadelphia 76ers*,52,30,0.634,7.0,109.8,105.3,4.3
3,Cleveland Cavaliers*,50,32,0.61,9.0,110.9,109.9,0.59
4,Indiana Pacers*,48,34,0.585,11.0,105.6,104.2,1.18
5,Miami Heat*,44,38,0.537,15.0,103.4,102.9,0.15
6,Milwaukee Bucks*,44,38,0.537,15.0,106.5,106.8,-0.45
7,Washington Wizards*,43,39,0.524,16.0,106.6,106.0,0.53
8,Detroit Pistons,39,43,0.476,20.0,103.8,103.9,-0.26
9,Charlotte Hornets,36,46,0.439,23.0,108.2,108.0,0.07


In [18]:
tables = soup.find_all('table') #Locate all table nodes

dfs = []
for i in range(10): #"len(tables)" instead of 3 to get all tables. len(tables)=13，目前列不全 应该是因为有空项
    table = parse_html_table(tables[i]) #Apply parse_html_table function
    dfs.append(table) # store table in a list
dfs[9]

Unnamed: 0,Rk,Team,G,MP,FG,FGA,FG%,3P,3PA,3P%,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Boston Celtics*,82,19805,38.7,88.0,0.44,9.7,28.6,0.339,...,0.763,10.0,35.4,45.4,22.0,7.5,4.6,14.6,19.8,103.9
1,2,Utah Jazz*,82,19755,38.8,86.4,0.449,9.9,27.0,0.365,...,0.771,9.0,34.3,43.3,20.8,8.6,4.8,15.5,21.4,103.9
2,3,San Antonio Spurs*,82,19730,40.1,88.5,0.453,9.6,27.6,0.348,...,0.759,9.7,35.0,44.6,22.9,8.0,4.1,14.8,20.7,104.8
3,4,Philadelphia 76ers*,82,19780,37.9,87.4,0.434,10.1,29.5,0.342,...,0.745,9.9,32.1,42.1,21.7,8.5,5.1,14.3,20.3,105.0
4,5,Toronto Raptors*,82,19830,39.1,87.2,0.449,9.1,25.4,0.357,...,0.767,10.0,33.2,43.2,22.2,7.3,5.0,14.6,20.3,105.9
5,6,Houston Rockets*,82,19755,40.4,87.4,0.462,10.3,29.5,0.351,...,0.746,8.8,34.1,42.9,22.9,7.6,4.5,14.9,20.8,106.1
6,7,Miami Heat*,82,19930,38.9,86.6,0.45,9.9,27.5,0.36,...,0.783,9.4,35.2,44.5,21.7,7.8,4.8,14.6,20.0,106.3
7,8,Portland Trail Blazers*,82,19755,39.6,88.7,0.447,10.0,27.4,0.364,...,0.755,9.6,34.7,44.3,20.8,7.6,5.3,13.0,19.7,106.4
8,9,Oklahoma City Thunder*,82,19830,39.5,86.2,0.458,11.5,31.4,0.367,...,0.769,9.8,33.5,43.3,23.9,7.9,4.7,16.5,21.9,107.2
9,10,Detroit Pistons,82,19805,40.4,88.0,0.459,11.4,31.8,0.359,...,0.776,9.5,35.7,45.2,26.0,7.5,5.0,15.3,19.0,107.3


In [31]:
# pd.read_html的作用：Read HTML tables into a list of DataFrame objects.
a = pd.read_html(url)
a[12]  # 这个方法能列出全部的表

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,% of FGA by Distance,% of FGA by Distance,% of FGA by Distance,...,% of FG Ast'd,Unnamed: 23_level_0,Dunks,Dunks,Unnamed: 26_level_0,Layups,Layups,Unnamed: 29_level_0,Corner,Corner
Unnamed: 0_level_1,Rk,Team,G,MP,FG%,Dist.,Unnamed: 6_level_1,2P,0-3,3-10,...,3P,Unnamed: 23_level_1,%FGA,Md.,Unnamed: 26_level_1,%FGA,Md.,Unnamed: 29_level_1,%3PA,3P%
0,1.0,Atlanta Hawks,82,19705,0.469,13.7,,0.646,0.271,0.127,...,0.872,,0.051,318,,0.259,1024,,0.216,0.381
1,2.0,Boston Celtics*,82,19805,0.44,13.1,,0.674,0.286,0.162,...,0.818,,0.046,294,,0.286,1094,,0.181,0.417
2,3.0,Brooklyn Nets,82,19855,0.466,12.6,,0.726,0.26,0.185,...,0.78,,0.043,275,,0.277,1080,,0.186,0.401
3,4.0,Chicago Bulls,82,19855,0.472,13.7,,0.624,0.292,0.122,...,0.864,,0.051,330,,0.264,1110,,0.213,0.393
4,5.0,Charlotte Hornets,82,19780,0.468,13.7,,0.657,0.262,0.144,...,0.853,,0.049,314,,0.258,1064,,0.194,0.418
5,6.0,Cleveland Cavaliers*,82,19730,0.474,13.4,,0.641,0.299,0.132,...,0.832,,0.055,362,,0.276,1148,,0.213,0.446
6,7.0,Dallas Mavericks,82,19805,0.469,13.8,,0.649,0.238,0.167,...,0.826,,0.051,314,,0.216,855,,0.19,0.384
7,8.0,Denver Nuggets,82,19880,0.476,12.8,,0.668,0.268,0.191,...,0.835,,0.049,310,,0.291,1188,,0.247,0.39
8,9.0,Detroit Pistons,82,19805,0.459,13.6,,0.638,0.251,0.177,...,0.904,,0.046,288,,0.271,1059,,0.233,0.387
9,10.0,Golden State Warriors*,82,19730,0.447,13.0,,0.676,0.278,0.166,...,0.802,,0.052,347,,0.274,1089,,0.189,0.378
