This notebook provides recipes for loading and saving data from external sources.

**Attention**: Colab is a temporary environment with an idle timeout of 90 minutes and an absolute timeout of 12 hours.

## 0. Using colab environment as terminal/command line

By using exclamation mark in front of any **command**, colab will treat it as terminal command. For example one could use **!ls** to list the files in the current directory. Here are few examples:


1.   `!wget "https://raw.githubusercontent.com/bingzhilee/python4linguists/main/mini-project/test_wiki_3_sentences.conll"` 
      <br>*for downloading the file*</br>
2.   `!tar -xzvf wikip-small.conll.tar.gz` 
      <br>*for unzipping wikip-small.conll.tar.gz file*</br>
3.  `!rm Test.py`
      <br> *removing the file Test.py from server* </br>

## 1. Loading individual file from GitHub

+ 1). Go to GitHub file that you want to download (e.g. https://github.com/bingzhilee/python4linguists/blob/main/mini-project/test_wiki_3_sentences.conll)
+ 2). Click on ```raw``` button and then copy the new adress, for the file above, copied adress is: https://raw.githubusercontent.com/bingzhilee/python4linguists/main/mini-project/test_wiki_3_sentences.conll
+ 3). Go to Google colab file and use ```!wget``` command with the copied address to download

You may notice that the dowloadable address differs from the file address just a little. In fact it would be enough to remove 'blob', and add 'raw.' after 'https://' to get the right address for any file

In [27]:
# copied raw GitHub file link
!wget "https://raw.githubusercontent.com/bingzhilee/python4linguists/main/mini-project/test_wiki_3_sentences.conll"


--2021-10-17 16:30:40--  https://raw.githubusercontent.com/bingzhilee/python4linguists/main/mini-project/test_wiki_3_sentences.conll
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2281 (2.2K) [text/plain]
Saving to: ‘test_wiki_3_sentences.conll’


2021-10-17 16:30:40 (41.5 MB/s) - ‘test_wiki_3_sentences.conll’ saved [2281/2281]



In [50]:
!wget "https://raw.github.com/bingzhilee/python4linguists/main/mini-project/wikip-small.conll.tar.gz"

--2021-10-17 16:38:26--  https://raw.github.com/bingzhilee/python4linguists/main/mini-project/wikip-small.conll.tar.gz
Resolving raw.github.com (raw.github.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.github.com (raw.github.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://raw.githubusercontent.com/bingzhilee/python4linguists/main/mini-project/wikip-small.conll.tar.gz [following]
--2021-10-17 16:38:27--  https://raw.githubusercontent.com/bingzhilee/python4linguists/main/mini-project/wikip-small.conll.tar.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 69284666 (66M) [application/octet-stream]
Saving to: ‘wikip-small.conll.tar.gz’


2021-10-17 16:38:29

In [52]:
!tar -xzvf 'wikip-small.conll.tar.gz' #unzip tar archive 

wikip-small.conll


In [54]:
!ls -lh # list directories in current location with detailed information 
!wc -l wikip-small.conll # count the total lines 

total 391M
drwx------ 5 root root 4.0K Oct 17 16:11 drive
drwxr-xr-x 1 root root 4.0K Oct  8 13:45 sample_data
-rw-r--r-- 1 root root 2.3K Oct 17 16:30 test_wiki_3_sentences.conll
-rw-r--r-- 1 1237  501 325M Oct 24  2019 wikip-small.conll
-rw-r--r-- 1 root root  67M Oct 17 16:38 wikip-small.conll.tar.gz
7535392 wikip-small.conll


In [41]:
# view the first 5 lines of (large) files 
!head -n 5 test_wiki_3_sentences.conll # "!tail file_name" to display the last 10(defaut) lines 


1	Requin	requin	N	NC	g=m|mwehead=P+D|n=s|s=c	9	mod	_	_
2	du	de	P+D	P+D	s=def	1	dep_cpd	_	_
3	futur	futur	N	NC	def=y|g=f|n=s|s=c	2	obj.p	_	_
4	:	:	PONCT	PONCT	s=w	9	ponct	_	_
5	Ce	ce	D	DET	g=m|n=s|s=dem	6	det	_	_


##2. Uploading files from your local file system

If you have a file in your local machine and you want to upload it to the server local drive, use the following code snippet!

`files.upload` returns a dictionary of the files which were uploaded.
The dictionary is keyed by the file name and values are the data which were uploaded.

In [55]:
uploaded = files.upload()


Saving conll.png to conll.png


In [56]:
!ls 

conll.png  sample_data			wikip-small.conll
drive	   test_wiki_3_sentences.conll	wikip-small.conll.tar.gz


## 3. Downloading files to your local machine

If you want to save some files from server into your local machine, use the following commands.

In [57]:
from google.colab import files
with open('example.txt', 'w') as f:
  f.write('some content')
files.download('example.txt')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [59]:
!ls

conll.png  example.txt	test_wiki_3_sentences.conll  wikip-small.conll.tar.gz
drive	   sample_data	wikip-small.conll
