<a href="https://colab.research.google.com/github/cloudflare/radar-notebooks/blob/main/notebooks/example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cloudflare Radar API

Example notebook demonstrating how to use the [Cloudflare Radar](http://radar.cloudflare.com/) API. Check out the [developer documentation](https://developers.cloudflare.com/radar/).

Data available via Radar API endpoints is made available under the [CC BY-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/).



### Dependencies

In [None]:
## Dependencies
!pip install requests pandas plotly

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import io
import json
from getpass import getpass
from pprint import pprint

import requests

import plotly.express as px
import plotly.graph_objects as go
import pandas as pd



def show_errors(text):
  if "errors" in text:
    print("Request returned an error. Check below for cause and possible fixes.")
    err = r.text
    try:
      d = json.loads(r.text)
      err = d["errors"][0]["message"]
    except:
      pass
    raise Exception(err)

## Getting Started

To make your first request to Cloudflare’s Radar API, you must obtain your [API token](https://developers.cloudflare.com/fundamentals/api/get-started/create-token/). You can create a `Custom Token`, with the `User - User Details` permissions group, and an `Edit` access level.

Once you have the token, you are ready to make your first request to the API hosted at https://api.cloudflare.com/client/v4/radar/.

In [None]:
BEARER_TOKEN = getpass() # "your-bearer-token" # TO EDIT

··········


In [None]:
BASE_API_URL = "https://api.cloudflare.com/client/v4/radar"
AUTH_HEADER = {
    "Authorization": f"Bearer {BEARER_TOKEN}",
}

## Netflows, aka Internet Traffic Change

[Netflows](https://en.wikipedia.org/wiki/NetFlow) shows eyeball network traffic data collected from Cloudflare’s edge routers, aka Radar’s Internet Traffic Change.

Netflows includes all kinds of traffic our routers get, not just traffic to websites served by the Cloudflare CDN product.

#### Get last 7 days of global traffic

In [None]:
last_week_traffic_change="netflows/timeseries?dateRange=7d&format=csv&name=traffic"
r = requests.get(f"{BASE_API_URL}/{last_week_traffic_change}", headers=AUTH_HEADER)
show_errors(r.text)
df = pd.read_csv(io.StringIO(r.text))
#df.columns = df.columns.str.replace(' ', '_')
df.head()

Unnamed: 0,Traffic timestamps,Traffic values
0,2022-11-04T16:00:00Z,0.956361
1,2022-11-04T17:00:00Z,0.928291
2,2022-11-04T18:00:00Z,0.872635
3,2022-11-04T19:00:00Z,0.883436
4,2022-11-04T20:00:00Z,0.877827


In [None]:
fig = px.line(df, x="Traffic timestamps", y="Traffic values", title='Global Internet Traffic Change')
fig.show()

### Compare one location to another: France vs Canada



In [None]:
france="name=france&dateRange=7d&location=FR"
canada="name=canada&dateRange=7d&location=CA"
global_params="format=csv&aggInterval=1h"
r = requests.get(f"{BASE_API_URL}/netflows/timeseries?{france}&{canada}&{global_params}", headers=AUTH_HEADER)
show_errors(r.text)
df = pd.read_csv(io.StringIO(r.text))
df.head()


fig = go.Figure()
fig.add_trace(go.Scatter(
                x=df['France timestamps'],
                y=df["France values"],
                name="France"))
fig.add_trace(go.Scatter(
                x=df['Canada timestamps'],
                y=df["Canada values"],
                mode='lines',
                name="Canada"))
fig.update_layout(
    xaxis_title="Time", title = "Hourly traffic from France and Canada - Last 7 days"
)
fig.show()


Cloudflare received more traffic from France during this timeframe. All timestamps are in UTC, so in order to understand at what time each location peaks in Internet usage, you'd have to convert to local time.

### Compare different time ranges: explore the Tonga outage using Netflows

Let’s compare Tonga in April versus January 2022, when there was an outage (see [blogpost](https://blog.cloudflare.com/tonga-internet-outage/)) due to a vulcano eruption.

In [None]:
tonga_april="name=tonga_april&dateStart=2022-04-14T02%3A00%3A00Z&dateEnd=2022-04-18T08%3A00%3A00Z&location=TO"
tonga_outage="name=tonga_outage&dateStart=2022-01-14T02%3A00%3A00Z&&dateEnd=2022-01-18T08%3A00%3A00Z&location=TO"
global_params="format=csv&aggInterval=1h"
r = requests.get(f"{BASE_API_URL}/netflows/timeseries?{tonga_april}&{tonga_outage}&{global_params}", headers=AUTH_HEADER)
show_errors(r.text)
df = pd.read_csv(io.StringIO(r.text))
df.head()
fig = go.Figure()
fig.add_trace(go.Scatter(
                x=df['Tonga_outage timestamps'],
                y=df["Tonga_april values"],
                name="Tonga April 2022"))
fig.add_trace(go.Scatter(
                x=df['Tonga_outage timestamps'],
                y=df["Tonga_outage values"],
                mode='lines',
                name="Tonga January 2022"))
fig.update_layout(
    xaxis_title="Time", title = "Tonga - Hourly traffic - January vs April 2022"
)
fig.show()

When did it end? Can we see the whole outage? Since it's a bigger time range, we can't look at it hourly, so let's look at it with a *daily* aggregation interval.

In [None]:
tonga_outage="name=tonga_outage&dateStart=2022-01-05T02%3A00%3A00Z&&dateEnd=2022-03-18T08%3A00%3A00Z&location=TO"
global_params="format=csv&aggInterval=1d"
r = requests.get(f"{BASE_API_URL}/netflows/timeseries?{tonga_outage}&{global_params}", headers=AUTH_HEADER)
show_errors(r.text)
df = pd.read_csv(io.StringIO(r.text))
df.head()

fig = go.Figure()

fig.add_trace(go.Scatter(
                x=df['Tonga_outage timestamps'],
                y=df["Tonga_outage values"],
                mode='lines',
                name="Tonga January 2022"))
fig.update_layout(
    xaxis_title="Time", title = "Tonga - Daily traffic from January to March 2022"
)
fig.show()

The outage lasted about 5 weeks, ending on February 22.

## HTTP requests

Investigate adoption and usage of Internet protocols, versions and traffic types, using HTTP traffic (includes HTTP and HTTPS traffic).

### Mobile vs Desktop internet Usage in South Africa



In [None]:
series1="name=sa&botClass=LIKELY_HUMAN&dateRange=7d&location=za"
global_params="format=csv&aggInterval=1h"
r = requests.get(f"{BASE_API_URL}/http/timeseries/device_type?{series1}&{global_params}", headers=AUTH_HEADER)
show_errors(r.text)
df = pd.read_csv(io.StringIO(r.text))


fig = go.Figure()

fig.add_trace(go.Scatter(
                x=df["Sa timestamps"],
                y=df["Sa mobile"],
                mode='lines',
                stackgroup='one',
                groupnorm='percent',
                name="Mobile"))
fig.add_trace(go.Scatter(
                x=df["Sa timestamps"],
                y=df["Sa desktop"],
                mode='lines',
                stackgroup='one',
                groupnorm='percent',
                name="Desktop"))

fig.add_trace(go.Scatter(
                x=df["Sa timestamps"],
                y=df["Sa other"],
                mode='lines',
                stackgroup='one',
                groupnorm='percent',
                name="Other"))
fig.update_layout(
    xaxis_title="Time", title = "Hourly traffic (HTTP) by device type in South Africa"
)
fig.show()
df.head()

Unnamed: 0,Sa timestamps,Sa mobile,Sa desktop,Sa other
0,2022-11-04T14:00:00Z,60.275498,39.57812,0.146382
1,2022-11-04T15:00:00Z,68.534536,31.39287,0.072594
2,2022-11-04T16:00:00Z,66.511772,33.348203,0.140025
3,2022-11-04T17:00:00Z,70.020801,29.825927,0.153272
4,2022-11-04T18:00:00Z,74.129297,25.713635,0.157068


In [None]:
series1="name=south_africa&botClass=LIKELY_HUMAN&dateRange=7d&location=za"
global_params="format=csv"
r = requests.get(f"{BASE_API_URL}/http/summary/device_type?{series1}&{global_params}", headers=AUTH_HEADER)
show_errors(r.text)
df = pd.read_csv(io.StringIO(r.text))
# process csv into right shape for plotly
df.columns = map(lambda x: x.replace('South_africa', ''), df.columns)
df = df.transpose()
df.columns = ["Percent"]

fig = px.pie(df, values='Percent', names=df.index, title="Traffic distribution by device type in South Africa")
fig.show()
df.head()

Unnamed: 0,Percent
mobile,64.902182
desktop,35.01498
other,0.082839


### Device distribution differs across locations

If we compare device distribution across several locations, we can see it differs significantly. For example, Zambia has around 80% mobile device traffic, which is considerably higher than other locations and higher than the global percentage (around 56%).

In [None]:
series1="name=south_africa&botClass=LIKELY_HUMAN&dateRange=7d&location=za"
series2="name=france&botClass=LIKELY_HUMAN&dateRange=7d&location=fr"
series3="name=zambia&botClass=LIKELY_HUMAN&dateRange=7d&location=zm"
series4="name=haiti&botClass=LIKELY_HUMAN&dateRange=7d&location=ht"
series5="name=global&botClass=LIKELY_HUMAN&dateRange=7d&location="
global_params="format=csv"
r = requests.get(f"{BASE_API_URL}/http/summary/device_type?{series1}&{series2}&{series3}&{series4}&{series5}&{global_params}", headers=AUTH_HEADER)
show_errors(r.text)
df = pd.read_csv(io.StringIO(r.text))
# process csv into right shape for plotly (column names into col values)
df = df.transpose()
df.columns = ["Percent"]
df = df.reset_index()
df["location"] = df["index"].apply(lambda x: x.split(" ")[0])
df["device"] = df["index"].apply(lambda x: x.split(" ")[1])


fig = px.bar(df, x="location", y="Percent", color="device", title="Traffic by Device Type")
fig.show()

df

Unnamed: 0,index,Percent,location,device
0,South_africa mobile,65.135352,South_africa,mobile
1,South_africa desktop,34.787121,South_africa,desktop
2,South_africa other,0.077527,South_africa,other
3,France mobile,52.472408,France,mobile
4,France desktop,47.471571,France,desktop
5,France other,0.056021,France,other
6,Zambia mobile,81.276237,Zambia,mobile
7,Zambia desktop,18.70486,Zambia,desktop
8,Zambia other,0.018903,Zambia,other
9,Haiti desktop,78.354731,Haiti,desktop


### Top locations by adoption of mobile or desktop traffic

What are the locations with the highest usage of mobile devices when accessing the Internet?

In [None]:
series1="name=mobile&botClass=LIKELY_HUMAN&dateRange=7d"
global_params="format=csv"
r = requests.get(f"{BASE_API_URL}/http/top/locations/device_type/mobile?{series1}&{global_params}", headers=AUTH_HEADER)
show_errors(r.text)
df = pd.read_csv(io.StringIO(r.text))
df

Unnamed: 0,Mobile client Country Alpha2,Mobile client Country Name,Mobile value
0,MR,Mauritania,81.861778
1,ZM,Zambia,81.276237
2,SY,Syria,81.223749
3,SD,Sudan,80.860075
4,YE,Yemen,79.058727


And desktop devices?

In [None]:
series1="name=desktop&botClass=LIKELY_HUMAN&dateRange=7d"
global_params="format=csv"
r = requests.get(f"{BASE_API_URL}/http/top/locations/device_type/desktop?{series1}&{global_params}", headers=AUTH_HEADER)
df = pd.read_csv(io.StringIO(r.text))
df

Unnamed: 0,Desktop client Country Alpha2,Desktop client Country Name,Desktop value
0,HT,Haiti,78.354731
1,LI,Liechtenstein,71.757837
2,MC,Monaco,68.728157
3,GI,Gibraltar,68.507069
4,SC,Seychelles,65.772613


## DNS queries

Access aggregated and anonymized DNS queries to our [1.1.1.1](https://1.1.1.1/dns/), public resolver service.

### DNS queries over time
We can also look at DNS queries over time. Let's look at https://www.nasa.gov/ around the time that Nasa released the first images from the James Webb telescope.



In [None]:
nasa="name=nasa&dateStart=2022-07-10T00%3A00%3A00Z&&dateEnd=2022-07-13T23%3A00%3A00Z&domain=www.nasa.gov" # for multiple domains, use a comma, eg. www.nasa.gov,webbtelescope.org
global_params="format=csv&aggInterval=1h" # in dns, the min agg interval is 1h
r = requests.get(f"{BASE_API_URL}/dns/timeseries?{nasa}&{global_params}", headers=AUTH_HEADER)
show_errors(r.text)
df = pd.read_csv(io.StringIO(r.text))
fig = px.line(df, x="Nasa timestamps", y="Nasa values", title='DNS queries to nasa.gov')
fig.show()
#df

It's interesting to note the 2 peaks in DNS queries to nasa.gov around the time the images were released (first set and second set of images).

### Top locations visiting a domain

We can also look at the geographical distribution of visitors to a domain.

In [None]:
nasa="name=nasa&dateStart=2022-07-10T00%3A00%3A00Z&&dateEnd=2022-07-13T23%3A00%3A00Z&domain=www.nasa.gov" # for multiple domains, use a comma, eg. www.nasa.gov,webbtelescope.org
global_params="format=csv&aggInterval=1h" # in dns, the min agg interval is 1h
r = requests.get(f"{BASE_API_URL}/dns/top/locations?{nasa}&{global_params}", headers=AUTH_HEADER)
show_errors(r.text)
df = pd.read_csv(io.StringIO(r.text))
df

Unnamed: 0,Nasa client Country Alpha2,Nasa client Country Name,Nasa value
0,US,United States,57.925636
1,DE,Germany,3.522505
2,JP,Japan,3.32681
3,CA,Canada,3.32681
4,TW,Taiwan,3.131115
