# Working with Files
Python uses file objects to interact with external files on your computer. These file objects can be any sort of file you have on your computer, whether it be an audio file, a text file, emails, Excel documents, etc. Note: You will probably need to install certain libraries or modules to interact with those various file types, but they are easily available. (We will cover downloading modules later on in the course).

Python has a built-in open function that allows us to open and play with basic file types. First we will need a file though. We're going to use some IPython magic to create a text file!

## Creating a text file with IPython
 This function is specific to jupyter notebooks! Alternatively, quickly create a simple .txt file with Sublime text editor.

In [38]:
%%writefile text_test.txt 
In west Philidelphia, born and raised. 
On the playground is where I spent most of my days.

Overwriting text_test.txt


In [3]:
myfile = open('text_test.txt')

In [4]:
pwd

'C:\\Users\\gthom\\NLP'

In [5]:
myfile

<_io.TextIOWrapper name='text_test.txt' mode='r' encoding='cp1252'>

In [6]:
print(myfile)

<_io.TextIOWrapper name='text_test.txt' mode='r' encoding='cp1252'>


### Reading a file

In [7]:
myfile.read()


'In west Philidelphia, born and raised. \nOn the playground is where I spent most of my days.\n'

In [8]:
myfile.read()

''

In [9]:
myfile.seek(0)

0

In [10]:
myfile.read()

'In west Philidelphia, born and raised. \nOn the playground is where I spent most of my days.\n'

In [11]:
myfile.seek(0)

0

In [12]:
content = myfile.read()

In [13]:
content

'In west Philidelphia, born and raised. \nOn the playground is where I spent most of my days.\n'

In [14]:
content

'In west Philidelphia, born and raised. \nOn the playground is where I spent most of my days.\n'

In [15]:
print(content)

In west Philidelphia, born and raised. 
On the playground is where I spent most of my days.



 Like a usb device still attached. If you try to use a file from another device for example and the file has not been closed the file could become corrupted.

In [16]:
myfile.close()

In [23]:
myfile = open('text_test.txt')

In [24]:
myfile.readlines()

['In west Philidelphia, born and raised. \n',
 'On the playground is where I spent most of my days.\n']

In [25]:
myfile.seek(0)

0

In [26]:
mylines = myfile.readlines()

In [27]:
mylines

['In west Philidelphia, born and raised. \n',
 'On the playground is where I spent most of my days.\n']

In [28]:
for line in mylines:
    print(line[0])

I
O


In [30]:
for line in mylines:
    print(line.split()[0])

In
On


### Writing to a file

WARNING: Writing to a file will overwrite the existing text in the file. 

In [32]:
myfile = open('text_test.txt', 'w+' )

In [33]:
myfile.read()

''

In [34]:
myfile.write('GETTING JIGGY WIT IT')

20

In [35]:
myfile.seek(0)

0

In [36]:
myfile.read()

'GETTING JIGGY WIT IT'

In [37]:
myfile.close()

### Append a file

* Re-ran the first cell to get the original text file

In [40]:
myfile = open('text_test.txt', 'a+')

In [41]:
myfile.write ('Getting jiggy wit it!')

21

In [42]:
myfile.close()

In [43]:
newfile = open('text_test.txt')

In [45]:
newfile.read()

'In west Philidelphia, born and raised. \nOn the playground is where I spent most of my days.\nGetting jiggy wit it!'

In [46]:
newfile.write('nah,nah,nah,nah,nah,nah,nah')

UnsupportedOperation: not writable

In [47]:
newfile.close()

In [48]:
myfile  = open('text_test.txt', 'a+')

In [49]:
myfile.write('nah,nah,nah,nah,nah,nah,nah')

27

In [50]:
myfile.seek(0)

0

In [51]:
myfile.read()

'In west Philidelphia, born and raised. \nOn the playground is where I spent most of my days.\nGetting jiggy wit it!nah,nah,nah,nah,nah,nah,nah'

In [52]:
myfile.write('\n What? You wanna ball with the kid?')

36

In [53]:
myfile.seek(0)

0

In [54]:
myfile.read()

'In west Philidelphia, born and raised. \nOn the playground is where I spent most of my days.\nGetting jiggy wit it!nah,nah,nah,nah,nah,nah,nah\n What? You wanna ball with the kid?'

In [55]:
myfile.close()

Known as a contents manager, we use a `with` function so we don't have to worry about closing the file after use.

In [56]:
with open('text_test.txt', 'r') as newfile:
    myvar = newfile.readlines()

In [57]:
myvar

['In west Philidelphia, born and raised. \n',
 'On the playground is where I spent most of my days.\n',
 'Getting jiggy wit it!nah,nah,nah,nah,nah,nah,nah\n',
 ' What? You wanna ball with the kid?']

### Formating datetime

https://strftime.org/



In [10]:
from datetime import datetime

In [11]:
today = datetime(day=3,month=1,year=2022)

In [12]:
print(f"The date is {today}")

The date is 2022-01-03 00:00:00


In [13]:
print(f"{today : %B}")

 January


In [14]:
print(f"{today :%d %B}")

03 January


In [1]:
print(f"{today :%d%B%Y}")

NameError: name 'today' is not defined

## Working with PDF Files

You may need to read in text data from a PDF file.

We can use the PyPDF2 library to read in text data from a PDF file.

**Keep in mind that not all PDF files have text that can be extracted**

Some PDFs are created through scanning, instead of being exported with a text editor like word.
These scanned PDFs are more like an image, so this will make it a lot harder to extract text. Specialist software may be required. 

The PyPDF2 library is made to extract text files from PDFs that were directly created with a word processor.

In [2]:
import PyPDF2

### Opening a PDF in python

In [4]:
myfile = open('retail-evidential-pack-april-2016.pdf', mode='rb')

In [5]:
pdf_reader = PyPDF2.PdfFileReader(myfile)

In [6]:
pdf_reader.numPages

20

In [7]:
page_one = pdf_reader.getPage(0)

In [8]:
page_one.extractText()

'Retail Crime Evidential Pack\nTime, Day, Date of Incident\nIncident Number Crime Number Full Name of Person \n Completing PackOrganisation '

In [9]:
print(page_one.extractText())

Retail Crime Evidential Pack
Time, Day, Date of Incident
Incident Number Crime Number Full Name of Person 
 Completing PackOrganisation 


In [10]:
my_text = page_one.extractText()

In [11]:
myfile.close()

### Appending a PDF

**NOTE**: Writing onto a PDF may not be possible as there is a lot of variables like font, size etc. It may be best to append the PDF

In [12]:
f = open('retail-evidential-pack-april-2016.pdf', 'rb')

In [13]:
pdf_reader = PyPDF2.PdfFileReader(f)

In [14]:
first_page = pdf_reader.getPage(0)

In [15]:
pdf_writer = PyPDF2.PdfFileWriter()

In [16]:
pdf_writer.addPage(first_page)

In [17]:
pdf_output = open('NEW_RETAIL_PDF.pdf', 'wb')

In [18]:
pdf_writer.write(pdf_output)

In [19]:
pdf_output.close()

In [20]:
f.close()

In [21]:
brand_new = open('NEW_RETAIL_PDF.pdf', 'rb')

pdf_reader = PyPDF2.PdfFileReader(brand_new)

In [23]:
pdf_reader.numPages

1

In [24]:
brand_new.close()

### Copying all pages

In [25]:
f = open('retail-evidential-pack-april-2016.pdf', 'rb')

pdf_text = []

pdf_reader = PyPDF2.PdfFileReader(f)

for p in range(pdf_reader.numPages):
    page = pdf_reader.getPage(p)
    
    pdf_text.append(page.extractText())
    
f.close()
    

In [26]:
pdf_text

['Retail Crime Evidential Pack\nTime, Day, Date of Incident\nIncident Number Crime Number Full Name of Person \n Completing PackOrganisation ',
 'GuidanceRules for Written Statements\nALWAYS:\nŁ  Be accurate and truthful in what you saw and you heard\n Ł  Include relevant information\n Ł  Always use black ink\n Ł  Write in a chronological sequence\n Ł  Should be legible and neat\n Ł  Use plain, simple English\n Ł  Use block capitals for names/places\n \n \n Ł  Always line out mistakes with a single line and initial (no tippex)\n \nNEVER:Ł  Include your opinion\n Ł  Use jargon or abbreviations\n Ł  Write too much\n \nMG11 Witness Statement\n\n to complete the following:   \nŁ Occupation Ł  Dates to be avoided\n  \n\nHowever, there is capacity for this information to be recorded on the back of the form, in case the CPS or Police need to \n\n\n \n',
 'Witness Statement\n\n\nAge if under 18:  \n   \n(if over 18 insert ‚over 18™)\n Occupation:    This statement (consisting of page(s) each s

In [27]:
for page in pdf_text:
    print(page)
    print('\n')
    print('\n')
    print('\n')
    print('\n')
    print('\n')

Retail Crime Evidential Pack
Time, Day, Date of Incident
Incident Number Crime Number Full Name of Person 
 Completing PackOrganisation 










GuidanceRules for Written Statements
ALWAYS:
Ł  Be accurate and truthful in what you saw and you heard
 Ł  Include relevant information
 Ł  Always use black ink
 Ł  Write in a chronological sequence
 Ł  Should be legible and neat
 Ł  Use plain, simple English
 Ł  Use block capitals for names/places
 
 
 Ł  Always line out mistakes with a single line and initial (no tippex)
 
NEVER:Ł  Include your opinion
 Ł  Use jargon or abbreviations
 Ł  Write too much
 
MG11 Witness Statement

 to complete the following:   
Ł Occupation Ł  Dates to be avoided
  

However, there is capacity for this information to be recorded on the back of the form, in case the CPS or Police need to 


 











Witness Statement


Age if under 18:  
   
(if over 18 insert ‚over 18™)
 Occupation:    This statement (consisting of page(s) each signed by me) is true to th