<a href="https://colab.research.google.com/github/Nidhi89717/NLP/blob/main/01-Python-Text-Basics/01_Working_with_Text_Files.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Working with Text Files


## Formatted String Literals (f-strings)

In [None]:
person = {'Nidhi'}

In [None]:
print(f"My name is {person}")

My name is {'Nidhi'}


In [None]:
d = {'a':123,'b':456}

In [None]:
print(f"{d['a']}")

123


In [None]:
mylist = [1,2,3,4]

In [None]:
print(f'{mylist[0]}')

1


In [None]:
library = [('Author', 'Topic', 'Pages'), ('Twain', 'Rafting', 601), ('Feynman', 'Physics', 95), ('Hamilton', 'Mythology', 144)]

In [None]:
library

[('Author', 'Topic', 'Pages'),
 ('Twain', 'Rafting', 601),
 ('Feynman', 'Physics', 95),
 ('Hamilton', 'Mythology', 144)]

In [None]:
for author in library:
    print(f"author is {author[0]}")

author is Author
author is Twain
author is Feynman
author is Hamilton


In [None]:
for author,topic,pages in library:
    print(f"Author is {author}")

Author is Author
Author is Twain
Author is Feynman
Author is Hamilton


In [None]:
for author,topic,pages in library:
    print(f"{author} {topic} {pages}")

Author Topic Pages
Twain Rafting 601
Feynman Physics 95
Hamilton Mythology 144


### Minimum Widths, Alignment and Padding
One can pass arguments inside a nested set of curly braces to set a minimum width for the field, the alignment and even padding characters.

In [None]:
library = [('Author', 'Topic', 'Pages'), ('Twain', 'Rafting in water alone', 601), ('Feynman', 'Physics', 95), ('Hamilton', 'Mythology', 144)]

In [None]:
for author,topic,pages in library:
    print(f"{author:{10}} {topic:{30}} {pages:{10}}")

Author     Topic                          Pages     
Twain      Rafting in water alone                601
Feynman    Physics                                95
Hamilton   Mythology                             144


Here the first three lines align, except `Pages` follows a default left-alignment while numbers are right-aligned. Also, the fourth line's page number is pushed to the right as `Mythology` exceeds the minimum field width of `8`. When setting minimum field widths make sure to take the longest item into account.

To set the alignment, use the character `<` for left-align,  `^` for center, `>` for right.<br>
To set padding, precede the alignment character with the padding character (`-` and `.` are common choices).

Let's make some adjustments:

In [None]:
for author,topic,pages in library:
    print(f"{author:{10}} {topic:{30}} {pages:.>{10}}")

Author     Topic                          .....Pages
Twain      Rafting in water alone         .......601
Feynman    Physics                        ........95
Hamilton   Mythology                      .......144


### Date Formatting

In [None]:
from datetime import datetime

In [None]:
today = datetime(year=2023,month=1,day=6)

In [None]:
print(f"{today}")

2023-01-06 00:00:00


In [None]:
today

datetime.datetime(2023, 1, 6, 0, 0)

In [None]:
print(f"{today:%B %d, %Y}")

January 06, 2023


# Files

Python has a built-in open function that allows us to open and play with basic file types. 

## Creating a File with IPython
#### This function is specific to jupyter notebooks! 

In [None]:
%%writefile test.txt
Hello, this is a quick test file
This is the second line of the file

Overwriting test.txt


In [None]:
pwd

'/content'

In [None]:
myfile = open('test.txt')

In [None]:
myfile

<_io.TextIOWrapper name='test.txt' mode='r' encoding='UTF-8'>

### .read() and .seek()

In [None]:
myfile.read()

'Hello, this is a quick test file\nThis is the second line of the file\n'

In [None]:
myfile.read()

''

In [None]:
myfile.seek(0)

0

In [None]:
myfile.read()

'Hello, this is a quick test file\nThis is the second line of the file\n'

In [None]:
myfile.seek(0)

0

In [None]:
content = myfile.read()

In [None]:
print(content)

Hello, this is a quick test file
This is the second line of the file



In [None]:
content

'Hello, this is a quick test file\nThis is the second line of the file\n'

In [None]:
myfile.close()

In [None]:
myfile = open('test.txt')

In [None]:
myfile.readlines()

['Hello, this is a quick test file\n', 'This is the second line of the file\n']

In [None]:
myfile.seek(0)

0

In [None]:
mylines = myfile.readlines()

In [None]:
for line in mylines:
    print(line[0])

H
T


In [None]:
for line in mylines:
    print(line.split()[0])

Hello,
This


## Writing to a File

By default, the `open()` function will only allow us to read the file. We need to pass the argument `'w'` to write over the file. For example:

In [None]:
myfile = open('test.txt','w+')

<div class="alert alert-danger" style="margin: 20px">**Use caution!**<br>
Opening a file with 'w' or 'w+' *truncates the original*, meaning that anything that was in the original file **is deleted**!</div>

In [None]:
myfile.read()

''

In [None]:
myfile.write('MY BRAND NEW TEXT')

17

In [None]:
myfile.seek(0)

0

In [None]:
myfile.read()

'MY BRAND NEW TEXT'

In [None]:
myfile.close()

## Appending to a File
Passing the argument `'a'` opens the file and puts the pointer at the end, so anything written is appended. Like `'w+'`, `'a+'` lets us read and write to a file. If the file does not exist, one will be created.

In [None]:
myfile = open('whoops.txt','a+')

In [None]:
myfile.write('My first line in a+ opening')

27

In [None]:
myfile.close()

In [None]:
myfile = open('whoops.txt',mode='a+')

In [None]:
myfile.write('This is an added line, because I used a+ mode')

45

In [None]:
myfile.seek(0)

0

In [None]:
myfile.read()

'My first line in a+ openingThis is an added line, because I used a+ mode\nThis is a real new line, on the next lineMy first line in a+ openingThis is an added line, because I used a+ mode'

In [None]:
myfile.write('\nThis is a real new line, on the next line')

42

In [None]:
myfile.seek(0)

0

In [None]:
myfile.read()

'My first line in a+ openingThis is an added line, because I used a+ mode\nThis is a real new line, on the next lineMy first line in a+ openingThis is an added line, because I used a+ mode\nThis is a real new line, on the next line'

In [None]:
myfile.seek(0)

0

In [None]:
print(myfile.read())

My first line in a+ openingThis is an added line, because I used a+ mode
This is a real new line, on the next lineMy first line in a+ openingThis is an added line, because I used a+ mode
This is a real new line, on the next line


In [None]:
myfile.close()

## Iterating through a File

In [None]:
with open('whoops.txt','r') as mynewfile:
    myvariable = mynewfile.readlines()

In [None]:
myvariable

['My first line in a+ openingThis is an added line, because I used a+ mode\n',
 'This is a real new line, on the next lineMy first line in a+ openingThis is an added line, because I used a+ mode\n',
 'This is a real new line, on the next line']