# Fall 2022: DS-GA 1011 NLP with Representation Learning
## Lab 1: 12-Sep-2022, Monday
## Introduction

### Pre-requisites
1. Python 3.7+
2. Virtual Environment
3. Jupyter Notebook/Lab

### Resources
1. [Python tutorial](https://docs.python.org/3.7/tutorial/)
2. [Python documentation](https://docs.python.org/3.7/)
2. [Python Introduction through CogSci exercise](https://colab.research.google.com/drive/1ghPQaTEdO9UH4s3gGD5OXmkYNvIwm2Zi)

### Environments
- [Local: Anaconda](https://docs.anaconda.com/anaconda/install/)
- [HPC Cluster](https://www.nyu.edu/life/information-technology/research-and-data-support/high-performance-computing.html)
- Cloud Platforms

---
### Getting Started (Shell commands - Local)

1. List conda environments
`conda env list`

2. Create new environment
`conda create --name <env_name>`

3. Activate environment
`conda activate <env_name>`

4. Get information on activated environment
`conda info`

4. List installed packages in the activated environment `conda list` OR `conda list <package_name>` (to search particular package)

5. Install pacakges in activated environment
`conda install -c conda-forge <package_name>` OR
`pip install <package_name>`

---
### [Juptyer Notebook/Lab](https://jupyter.org)
Open-source web application for creating documents that contain live code, equations, visualizations and narrative text. JupyterLab is enhanced web-based interactive development interface.

#### Cell Types

- Markdown: Allows adding text, images, and latex equations to the notebook. Also enables text formatting, embedding code, creating tables etc.

Resources: **Help-->Markdown/Markdown Reference**. [Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet).
    

- Code: As the name suggest, where the code is written in supported language to generate output in text or images. Can use shell (using !) and [magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html) (using % or %%).
    

- Raw: Render different code formats. More information [here](https://nbsphinx.readthedocs.io/en/0.7.1/raw-cells.html).

#### Help
- Keyboard Shortcuts
- Reference Documentation (under **Help** menu)

#### Kernel
- Install: `python -m ipykernel install --user --name=<env_name>`
- List: `jupyter kernelspec list`
- Uninstall: `jupyter kernelspec uninstall <env_name>`

In [1]:
# Shell command
!pwd

/content


In [None]:
!which python

/usr/local/bin/python


In [None]:
# List magics
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%

In [2]:
# Code
a = 2
b = 3
c = a+b
print(a, '+', b, '=', c)

2 + 3 = 5


In [None]:
# Help
?c

In [5]:
li = [1, 2, 3, 4]
ys = [2 * x + 1 for x in li]
# y = []
# for x in li:
#   y.append(2 * x + 1)
# print(y)
print(ys)

[3, 5, 7, 9]


---
### [Numpy](https://numpy.org/doc/stable/reference/?v=20200903223413)
Scientific computing package that provides array computing features like indexing, vectorization & broadcasting to perform fast mathematical operations through pre-compiled C code.

In [6]:
import numpy as np

In [7]:
x = [1,2,3]

In [8]:
type(x)

list

In [10]:
array_x = np.array(x)
print(type(array_x))

<class 'numpy.ndarray'>


In [11]:
# Array creation functions
print('Zeros\n', np.zeros((3, 3)))
print('Ones\n', np.ones((3, 3)))
print('Identity\n', np.identity(3))
print('Diagonal\n', np.diag(np.array([1, 2, 3]))) # diagonal matrix
print('Range\n', np.arange(9).reshape(3,3))
print('Random\n', np.random.rand(3, 3))

Zeros
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
Ones
 [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
Identity
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Diagonal
 [[1 0 0]
 [0 2 0]
 [0 0 3]]
Range
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
Random
 [[0.31590524 0.41584541 0.08077213]
 [0.02018369 0.9222255  0.35828335]
 [0.24819529 0.7491896  0.03095424]]


In [12]:
vec = np.arange(9)
print(vec)
arr = vec.reshape(3,3)
print(arr)

[0 1 2 3 4 5 6 7 8]
[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [13]:
# Slicing
arr_12 = arr[1, 2]
print(arr_12)
arr_2 = arr[:,2]
print(arr_2)

5
[2 5 8]


In [14]:
# Broadcasting
arr + 5

array([[ 5,  6,  7],
       [ 8,  9, 10],
       [11, 12, 13]])

In [16]:
# Vectorization
def myfunc(a: int, b: int):
    "Return a-b if a>b, otherwise return a+b"
    if a > b:
        return a - b
    else:
        return a + b

In [17]:
vfunc = np.vectorize(myfunc)
vfunc([1, 2, 3, 4], 2)

array([3, 4, 1, 2])

In [20]:
# Matrix multiplication
W = np.ones([10, 10])
x = np.ones([10])
np.matmul(W, x)

array([10., 10., 10., 10., 10., 10., 10., 10., 10., 10.])

---
# References
DS-GA 1007 Programming for Data Science Fall 2018