# Chapter 5. Sequences
- Arrays (배열)
- Ranges (범주)

- collections
 - 데이터 하나가 아니라 데이터 집합을 다루는 경우
- sequences
 - sequential collections : 순서가 있는 데이터 집합
 - 여러 값들을 프로그램에서 하나의 변수명으로 효과적으로 처리
 - 본 class에서는 array를 중점 활용
   - 실제로 Numpy 패키지의 ndarray 데이터타입 [Go](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html) 

In [1]:
from datascience import *
english_parts_of_speech = make_array("noun", "pronoun", "verb", "adverb", "adjective", "conjunction", "preposition", "interjection")
english_parts_of_speech

  matplotlib.use('agg', warn=False)
  matplotlib.use('agg', warn=False)


array(['noun', 'pronoun', 'verb', 'adverb', 'adjective', 'conjunction',
       'preposition', 'interjection'], dtype='<U12')

In [2]:
baseline_high = 14.48
highs = make_array(baseline_high - 0.880, baseline_high - 0.093,
                   baseline_high + 0.105, baseline_high + 0.684)
highs
type(highs)

numpy.ndarray

- array 변수를 수식에 사용 가능
 - 벡터 스칼라 연산

In [3]:
(9/5) * highs + 32  # 섭씨를 화씨로 변경

array([56.48  , 57.8966, 58.253 , 59.2952])

- 함수에 집합 변수를 전달하여 활용

In [4]:
sum(highs)

57.736000000000004

In [5]:
len(highs)

4

In [6]:
sum(highs)/len(highs)

14.434000000000001

- 혹은 method를 호출하여 동일한 연산 가능

In [7]:
highs.size

4

In [8]:
highs.sum()

57.736000000000004

In [9]:
highs.sum()/highs.size

14.434000000000001

In [10]:
highs.mean() # 평균 계산

14.434000000000001

In [11]:
import numpy as np
np.diff(highs)

array([0.787, 0.198, 0.579])

- numpy 패키지를 조금 더 살펴보자.
 - 많은 method 혹은 function들이 사용이 가능하지만 본 class에서는 그 중 일부만을 사용한다. 
 - 다음 표에 나오는 주요 method/function들을 정리해 두자.

- 입력: array, 출력: single value

|Function|Description|
--|--
| np.prod | Multiply all elements together |
| np.sum  | Add all elements together |
| np.all  | Test whether all elements are true values (non-zero numbers are true) |
| np.any  | Test whether any elements are true values (non-zero numbers are true) |
| np.count_nonzero  | Count the number of non-zero elements |


- 입력: array, 출력: array

|Function|Description|
--|--
| np.diff   | Difference between adjacent elements |
| np.round  | Round each number to the nearest integer (whole number) |
| np.cumprod  | A cumulative product: for each element, multiply all elements so far  |
| np.exp  | Exponentiate each element |
| np.log  | Take the natural logarithm of each element |
| np.sqrt  | Take the square root of each element |
| np.sort  | Sort the elements |

- 입력: string(즉, 문자array), 출력: array

|Function|Description|
--|--
| np.char.lower   | Lowercase each element |
|  np.char.upper   | Uppercase each element |
| np.char.strip   |Remove spaces at the beginning or end of each element  |
|  np.char.isalpha    | Whether each element is only letters (no numbers or symbols) |
|  np.char.isnumeric  | Whether each element is only numeric (no letters) |

- 입력: string과 탐색 string , 출력: array

|Function|Description|
--|--
|  np.char.count    | Count the number of times a search string appears among the elements of an array |
|  np.char.find    | The position within each element that a search string is found first |
|  np.char.rfind   | The position within each element that a search string is found last  |
|  np.char.startswith     | Whether each element starts with the search string |

- Range
 - 증가 혹은 감소하는 숫자 배열
 - interval을 표시하는데 사용
 - np.arange()를 이용하여 생성
  - arguments: start, end, step 값
  - 주의) start는 포함하지만 end는 불포함

In [12]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [13]:
np.arange(3, 9)

array([3, 4, 5, 6, 7, 8])

In [14]:
np.arange(3, 30, 5)

array([ 3,  8, 13, 18, 23, 28])

In [15]:
np.arange(1.5, -2, -0.5)

array([ 1.5,  1. ,  0.5,  0. , -0.5, -1. , -1.5])

- Example) 라이프니쯔 $\pi$ 계산식

$$ \pi \approx 4 \cdot (1- \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \cdots ) $$

- 프로그램을 통해 검증해 보자.
 - 처음 5000개항까지 계산
$$ \pi \approx 4 \cdot (1- \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \cdots - \frac{1}{9999}) $$
$$ \pi \approx 4 \cdot (( 1 + \frac{1}{5} + \frac{1}{9} \cdots + \frac{1}{9997} ) - (\frac{1}{3} + \frac{1}{7} + \cdots + \frac{1}{9999})) $$


In [16]:
positive_term_denominators = np.arange(1, 10000, 4)
positive_term_denominators

array([   1,    5,    9, ..., 9989, 9993, 9997])

In [17]:
positive_terms = 1 / positive_term_denominators
positive_terms

array([1.00000000e+00, 2.00000000e-01, 1.11111111e-01, ...,
       1.00110121e-04, 1.00070049e-04, 1.00030009e-04])

In [19]:
negative_terms = 1 / (positive_term_denominators + 2)
negative_terms

array([3.33333333e-01, 1.42857143e-01, 9.09090909e-02, ...,
       1.00090081e-04, 1.00050025e-04, 1.00010001e-04])

In [20]:
pi=4 * ( sum(positive_terms) - sum(negative_terms) )
pi

3.1413926535917955

- array와 array 연산

In [21]:

baseline_high = 14.48
highs = make_array(baseline_high - 0.880, 
                   baseline_high - 0.093,
                   baseline_high + 0.105, 
                   baseline_high + 0.684)
highs


array([13.6  , 14.387, 14.585, 15.164])

In [22]:

baseline_low = 3.00
lows = make_array(baseline_low - 0.872, 
                  baseline_low - 0.629,
                  baseline_low - 0.126, 
                  baseline_low + 0.728)
lows


array([2.128, 2.371, 2.874, 3.728])

In [23]:

gaps = make_array(
    highs.item(0) - lows.item(0),
    highs.item(1) - lows.item(1),
    highs.item(2) - lows.item(2),
    highs.item(3) - lows.item(3))
gaps


array([11.472, 12.016, 11.711, 11.436])

In [24]:

gaps_another = highs - lows
gaps_another


array([11.472, 12.016, 11.711, 11.436])

- 다른 방법 - $\pi$ 계산 
 -  John Wallis 식
 
$$ \pi = 2 \cdot ( \frac{2}{1} \cdot \frac{2}{3} \cdot \frac{4}{3} \cdot \frac{4}{5} \cdot \frac{6}{5} \cdot \frac{6}{7} \cdots ) $$

$$ \pi \approx 2 \cdot ( \frac{2}{1} \cdot \frac{4}{3} \cdot \frac{6}{5} \cdots \frac{1000000}{999999} ) \cdot (\frac{2}{3} \cdot \frac{4}{5} \cdot \frac{6}{7} \cdots \frac{1000000}{1000001} ) $$


In [25]:

even = np.arange(2, 1000001, 2)
one_below_even = even - 1
one_above_even = even + 1
2 * np.prod(even/one_below_even) * np.prod(even/one_above_even)


3.1415910827951143

#### RECAP
- Arrays
 - 같은 데이터형의 집합을 다루는 데 유용
- Ranges
 - 같은 간격의 수열