## Numpy

- Homepage: http://www.numpy.org/
- Reference: https://docs.scipy.org/doc/numpy/reference/
- Tutorial: https://docs.scipy.org/doc/numpy/user/quickstart.html
- Internals: https://docs.scipy.org/doc/numpy/reference/internals.html

- Features:
  - a powerful N-dimensional array object
  - sophisticated (broadcasting) functions
  - tools for integrating C/C++ and Fortran code
  - useful linear algebra, Fourier transform, and random number capabilities


In [1]:
import numpy as np

In [2]:
# Examples

In [3]:
A = np.array([1,2,3])
print(A)

[1 2 3]


In [4]:
A.shape # 1d array of length 3

(3,)

In [5]:
# Basic arithmetic
print("Sum:", A.sum())
print("Mean:", A.mean())
print("Max:", A.max())

Sum: 6
Mean: 2.0
Max: 3


In [6]:
# Indexing
print("First element:", A[0])
print("Slice A[0] .. A[1]:", A[0:2])

First element: 1
Slice A[0] .. A[1]: [1 2]


In [7]:
# Broadcasting
print("Scalar multiplication:", A * 2)
print("Entry wise square:", A * A)

Scalar multiplication: [2 4 6]
Entry wise square: [1 4 9]


In [8]:
# Two dimensional arrays
B = np.array([[1,2,3],[4,5,6]])

In [9]:
print("Element at 0,0:", B[0,0])
print("Row 0:", B[0,:])
print("Col 0:", B[:,0])

Element at 0,0: 1
Row 0: [1 2 3]
Col 0: [1 4]


In [10]:
!ls ../../datasets/

API_latencies.csv		    HistogramExample_quotes.csv
api_latency_histogram:1W@60sec.csv  http_durations:1w@60s.csv
api_latency_samples:1W@60sec.json   http_durations:1w@60s_out.csv
cluster_cpu_idle:1w@60s.csv	    LogDB.csv
cpu.out.csv			    LogDB_full.csv
dc1cpu.csv			    LogDB_head.csv
dtrace_histogram_a.txt		    LogDB.out
dtrace_histogram_a.txt.csv	    LogNASA.out
dtrace_histogram_b.txt		    MobileRequestRate.csv
dtrace_histogram_b.txt.csv	    ReqMultiNode.csv
HistogramAPI_10.csv		    ReqMultiNode_long.csv
HistogramAPI_15.csv		    ReqRate4w.csv
HistogramAPI_1800.json		    ReqRateTrend.csv
HistogramAPI_180.json		    request_rate_cluster:4w@5M.csv~
HistogramAPI.csv		    request_rate_cluster:6h@5M.csv
HistogramAPI.json		    request_rates.csv
HistogramAPI_samples_10.csv	    RequestRates.csv
HistogramAPI_samples_15.csv	    sycalls:1d@60s.csv
HistogramAPI_samples_1.csv	    syscall_latency@1day.json
HistogramAPI_samples_5.csv	    SystemCpu.out
HistogramAPI_samples_all.csv	    WebLatency.csv
H

In [11]:
# Loading data

X = np.loadtxt("../../datasets/web_request_rate:4w@5M.csv", delimiter=",", skiprows=1)

In [12]:
# Exercises:

# 1. How many rows and colums does the loaded dataset have?

# 2. Calculate the min/max of the first column? Is the first column sorted? (Hint: use np.msort, np.all)

# 3. Calculate the min/max/mean of the second column

# 4. Use np.percentile to compute 0,10,20,...,100 percentile of the second column