# **Numpy** 

Esta biblioteca, cuyo nombre significa numerical Python, constituye el núcleo de muchas otras bibliotecas de Python que se han originado a partir de ella. De hecho, NumPy es la biblioteca básica para la computación científica en Python, ya que proporciona estructuras de datos y funciones de alto rendimiento que el paquete básico de Python no proporciona.

NumPy es una biblioteca para el lenguaje de programación Python que da soporte para crear vectores y matrices grandes multidimensionales, junto con una gran colección de funciones matemáticas de alto nivel para operar con ellas.

## Importar la libreria NumPy

In [2]:
import numpy as np 

In [3]:
x = np.array([1,2,3])
x

array([1, 2, 3])

In [4]:
np.sin(x)

array([0.84147098, 0.90929743, 0.14112001])

In [5]:
from math import sin
seno = []
for i in [1,2,3]:
    seno.append(sin(i))
seno

[0.8414709848078965, 0.9092974268256817, 0.1411200080598672]

In [6]:
[sin(i) for i in [1,2,3]]

[0.8414709848078965, 0.9092974268256817, 0.1411200080598672]

In [7]:
x = np.array([[1,2,3], [4,5,6]])
x.shape

(2, 3)

In [8]:
y = np.array([[1,2,3,4], [5,6,7,8],[9,10,11,12]])
y

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [9]:
y[0, :]

array([1, 2, 3, 4])

In [10]:
for i in [0,1,2]:
    print(y[i, :])

[1 2 3 4]
[5 6 7 8]
[ 9 10 11 12]


In [11]:
for j in [0,1,2,3]:
    print(y[:, j])

[1 5 9]
[ 2  6 10]
[ 3  7 11]
[ 4  8 12]


In [12]:
y[: , 1:]

array([[ 2,  3,  4],
       [ 6,  7,  8],
       [10, 11, 12]])

In [13]:
y[: , ::-2]

array([[ 4,  2],
       [ 8,  6],
       [12, 10]])

## **Creación intrinseca de arreglos**

La biblioteca NumPy proporciona un conjunto de funciones que generan ndarrays con contenido inicial, creados con diferentes valores según la función.



In [14]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [15]:
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [16]:
np.arange(0,10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
np.arange(4,10)

array([4, 5, 6, 7, 8, 9])

In [18]:
np.arange(0,12,3)

array([0, 3, 6, 9])

In [19]:
np.arange(0,6,0.6)

array([0. , 0.6, 1.2, 1.8, 2.4, 3. , 3.6, 4.2, 4.8, 5.4])

In [20]:
np.arange(0,12).reshape(3,4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [21]:
np.linspace(0,10,5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [22]:
np.random.random(3)

array([0.24820569, 0.30105876, 0.69074629])

In [23]:
np.random.random((5,4))

array([[0.29564859, 0.5490392 , 0.78882203, 0.58696268],
       [0.24270592, 0.16715358, 0.90833026, 0.87282863],
       [0.92720586, 0.41248044, 0.77047853, 0.80926278],
       [0.11224708, 0.11487233, 0.00954723, 0.06874914],
       [0.33060075, 0.21111713, 0.98313866, 0.80805959]])

## **Operaciones básicas**

In [24]:
a = np.arange(0,4)
a

array([0, 1, 2, 3])

In [25]:
a+4

array([4, 5, 6, 7])

In [26]:
2*a

array([0, 2, 4, 6])

In [27]:
A = np.matrix(np.arange(0,12).reshape(3,4))
B = np.matrix(np.arange(12,24).reshape(4,3))


In [28]:
np.dot(A,B)

matrix([[114, 120, 126],
        [378, 400, 422],
        [642, 680, 718]])

In [29]:
np.dot(B,A)

matrix([[164, 203, 242, 281],
        [200, 248, 296, 344],
        [236, 293, 350, 407],
        [272, 338, 404, 470]])

In [30]:
a = np.arange(0,12).reshape(3,4)
b = np.arange(12,24).reshape(3,4)
b*a

array([[  0,  13,  28,  45],
       [ 64,  85, 108, 133],
       [160, 189, 220, 253]])

In [31]:
a*b

array([[  0,  13,  28,  45],
       [ 64,  85, 108, 133],
       [160, 189, 220, 253]])


## **Funciones universales**

In [32]:
a = np.arange(1,6)
a

array([1, 2, 3, 4, 5])

In [33]:
np.sqrt(a)

array([1.        , 1.41421356, 1.73205081, 2.        , 2.23606798])

In [34]:
np.log(a)

array([0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791])

In [35]:
np.cos(a)

array([ 0.54030231, -0.41614684, -0.9899925 , -0.65364362,  0.28366219])

## **Agregar funciones**

In [36]:
a = 5*np.random.random(5)+1
a

array([2.00049236, 3.24950197, 5.26088171, 5.68897262, 1.23264129])

In [37]:
a.min(), a.max(), a.sum(), a.mean(), a.std()

(1.23264129318121,
 5.68897262301388,
 17.432489947217626,
 3.4864979894435253,
 1.7517752112640923)

## **Indexando**

In [38]:
a = np.arange(10, 16)
a

array([10, 11, 12, 13, 14, 15])

In [39]:
a[4], a[-1], a[0],a[-6]

(14, 15, 10, 10)

In [40]:
a[[1,3,4    ]]

array([11, 13, 14])

In [41]:
A = np.arange(10,19).reshape((3,3))
A

array([[10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

In [42]:
A[1,2]

15

## **Slicing**


In [43]:
a = np.arange(10,16)
a

array([10, 11, 12, 13, 14, 15])

In [44]:
a[1:4]

array([11, 12, 13])

In [45]:
a[1:5:2]

array([11, 13])

In [46]:
a[::2]

array([10, 12, 14])

In [47]:
a[:5:2]

array([10, 12, 14])

In [48]:
a[:5:]

array([10, 11, 12, 13, 14])

In [49]:
b = np.arange(10,31)
b

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
       27, 28, 29, 30])

In [50]:
b[1:14:2]

array([11, 13, 15, 17, 19, 21, 23])

In [51]:
b[:21:]

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
       27, 28, 29, 30])

In [52]:
A = np.arange(10,19).reshape((3,3))
A

array([[10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

In [53]:
A[0,:]

array([10, 11, 12])

In [54]:
A[:, 0]

array([10, 13, 16])

In [55]:
A[0:2, 0:2]

array([[10, 11],
       [13, 14]])

In [56]:
A[[0,2], 0:2]

array([[10, 11],
       [16, 17]])

## **Condicionales y arreglos Booleanos**

In [57]:
A = np.random.random((4,4))
A 

array([[0.54214321, 0.36684957, 0.21711946, 0.33465672],
       [0.05461352, 0.81465993, 0.04733058, 0.89299234],
       [0.04534219, 0.64567807, 0.09082792, 0.76823304],
       [0.3813032 , 0.16923522, 0.21546413, 0.56140014]])

In [58]:
A>0.5

array([[ True, False, False, False],
       [False,  True, False,  True],
       [False,  True, False,  True],
       [False, False, False,  True]])

In [59]:
A[A>0.5]

array([0.54214321, 0.81465993, 0.89299234, 0.64567807, 0.76823304,
       0.56140014])

# **Pandas**

pandas es una biblioteca de Python de código abierto para análisis de datos altamente especializados. Actualmente es el punto de referencia que todos los profesionales que utilizan el lenguaje Python deben estudiar con fines estadísticos de análisis y toma de decisiones.

In [60]:
import pandas as pd 

In [61]:
s = pd.Series([12, -4, 7,9])
s

0    12
1    -4
2     7
3     9
dtype: int64

In [62]:
s = pd.Series([12, -4, 7,9], index= ['a', 'b', 'c', 'd'])
s


a    12
b    -4
c     7
d     9
dtype: int64

In [198]:
s.values, s.index

(array([12, -4,  7,  9]), Index(['a', 'b', 'c', 'd'], dtype='object'))

## **Seleccionando elementos internos**

In [63]:
s[2]

7

In [64]:
s['b']

-4

In [65]:
s[0:2]

a    12
b    -4
dtype: int64

In [67]:
s[[0,2]], s[['a', 'c']]

(a    12
 c     7
 dtype: int64,
 a    12
 c     7
 dtype: int64)

## **Asignando valores a los elementos**

In [69]:
s[1]= 0
s

a    12
b     0
c     7
d     9
dtype: int64

In [71]:
s['b'] = 1
s

a    12
b     1
c     7
d     9
dtype: int64

## **Definiendo una serie a partir de una array y otras series**

In [79]:
arr = np.arange(1,10,2)
s3 = pd.Series(arr)
s3

0    1
1    3
2    5
3    7
4    9
dtype: int64

In [85]:
s4 = pd.Series(s)
s4

a    12
b     1
c     7
d     9
dtype: int64

## **Filtrando valores**

Gracias a la elección de la libreria NumPy como la base para libreria pandas, como resultado para su estructura de datos, muchas operaciones que son aplicables a los arrays de NumPy, se pueden extender a las series. 

In [87]:
s[s>8]

a    12
d     9
dtype: int64

In [93]:
colors = pd.Series([1,0,2,1,2,3], index=['white','white','blue','green','green','yellow'])
colors

white     1
white     0
blue      2
green     1
green     2
yellow    3
dtype: int64

In [100]:
colors.unique()

1    2
2    2
0    1
3    1
dtype: int64

In [101]:
colors.value_counts()

1    2
2    2
0    1
3    1
dtype: int64

In [102]:
colors.isin([0,3])

white     False
white      True
blue      False
green     False
green     False
yellow     True
dtype: bool

In [103]:
colors[colors.isin([0,3])]

white     0
yellow    3
dtype: int64

## **Valores NaN**

In [104]:
s2 = pd.Series([5,-3,np.NaN,14])
s2

0     5.0
1    -3.0
2     NaN
3    14.0
dtype: float64

In [106]:
s2[s2.isnull()]

2   NaN
dtype: float64

In [108]:
s2[s2.notnull()]

0     5.0
1    -3.0
3    14.0
dtype: float64

## **Series y diccionarios**

In [109]:
mydict = {'red': 2000, 'blue': 1000, 'yellow': 500,
'orange': 1000}

In [111]:
s5 = pd.Series(mydict)
s5

red       2000
blue      1000
yellow     500
orange    1000
dtype: int64

In [112]:
colors = ['red','yellow','orange','blue','green']
myseries = pd.Series(mydict, index=colors)
myseries

red       2000.0
yellow     500.0
orange    1000.0
blue      1000.0
green        NaN
dtype: float64

## **Dataframe**

El dataframe es una estructura de datos tabular muy similar a una hoja de cálculo. Esta estructura de datos esta diseñada para extender las series a multiples dimensiones.

In [113]:
data = {'color' : ['blue','green','yellow','red','white'],       'object' : ['ball','pen','pencil','paper','mug'],
                     'price' : [1.2,1.0,0.6,0.9,1.7]}

In [115]:
frame = pd.DataFrame(data)
frame

Unnamed: 0,color,object,price
0,blue,ball,1.2
1,green,pen,1.0
2,yellow,pencil,0.6
3,red,paper,0.9
4,white,mug,1.7


## **Operaciones y funciones matemáticas**

De forma similar que las filtraciones las operciones matemáticas (+,-,*,/) y las funciones matemáticas de NumPy se pueden aplicar a las series.

In [90]:
s/2

a    6.0
b    0.5
c    3.5
d    4.5
dtype: float64

In [91]:
np.log(s)

a    2.484907
b    0.000000
c    1.945910
d    2.197225
dtype: float64

# **Referencias**

[1]. https://es.wikipedia.org/wiki/NumPy 

[2]. Unpingo J. Python for Probability, Statistics and Machine Learning. Second Edition. Ed. 
Springer