Python provides basic functions and methods necessary to manipulate files by default. You can do most of the file manipulation using a file object.

**The open Function**
Before you can read or write a file, you have to open it using Python's built-in open() function. This function creates a file object, which would be utilized to call other support methods associated with it.

**Syntax**
```python
file object = open(file_name [, access_mode][, buffering])
```

Here are parameter details −

**file_name** − The file_name argument is a string value that contains the name of the file that you want to access.

**access_mode** − The access_mode determines the mode in which the file has to be opened, i.e., read, write, append, etc. A complete list of possible values is given below in the table. This is an optional parameter and the default file access mode is read (r).

**buffering** − If the buffering value is set to 0, no buffering takes place. If the buffering value is 1, line buffering is performed while accessing a file. If you specify the buffering value as an integer greater than 1, then buffering action is performed with the indicated buffer size. If negative, the buffer size is the system default(default behavior).

Here is a list of the different modes of opening a file −

In [2]:
from IPython.display import HTML, display

display(HTML("""<table class="table table-bordered">
	<tbody>
		<tr>
			<th style="text-align:center;">Sr.No.</th>
			<th style="text-align:center;">Mode &amp; Description</th>
		</tr>
		<tr>
			<td class="ts">1</td>
			<td>
				<p>
					<b>r</b>
				</p>
				<p>Opens a file for reading only. The file pointer is placed at the beginning of the file. This is the default mode.</p>
			</td>
		</tr>
		<tr>
			<td class="ts">2</td>
			<td>
				<p>
					<b>rb</b>
				</p>
				<p>Opens a file for reading only in binary format. The file pointer is placed at the beginning of the file. This is the default mode.</p>
			</td>
		</tr>
		<tr>
			<td class="ts">3</td>
			<td>
				<p>
					<b>r+</b>
				</p>
				<p>Opens a file for both reading and writing. The file pointer placed at the beginning of the file.</p>
			</td>
		</tr>
		<tr>
			<td class="ts">4</td>
			<td>
				<p>
					<b>rb+</b>
				</p>
				<p>Opens a file for both reading and writing in binary format. The file pointer placed at the beginning of the file.</p>
			</td>
		</tr>
		<tr>
			<td class="ts">5</td>
			<td>
				<p>
					<b>w</b>
				</p>
				<p>Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.</p>
			</td>
		</tr>
		<tr>
			<td class="ts">6</td>
			<td>
				<p>
					<b>wb</b>
				</p>
				<p>Opens a file for writing only in binary format. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.</p>
			</td>
		</tr>
		<tr>
			<td class="ts">7</td>
			<td>
				<p>
					<b>w+</b>
				</p>
				<p>Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.</p>
			</td>
		</tr>
		<tr>
			<td class="ts">8</td>
			<td>
				<p>
					<b>wb+</b>
				</p>
				<p>Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.</p>
			</td>
		</tr>
		<tr>
			<td class="ts">9</td>
			<td>
				<p>
					<b>a</b>
				</p>
				<p>Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.</p>
			</td>
		</tr>
		<tr>
			<td class="ts">10</td>
			<td>
				<p>
					<b>ab</b>
				</p>
				<p>Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.</p>
			</td>
		</tr>
		<tr>
			<td class="ts">11</td>
			<td>
				<p>
					<b>a+</b>
				</p>
				<p>Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.</p>
			</td>
		</tr>
		<tr>
			<td class="ts">12</td>
			<td>
				<p>
					<b>ab+</b>
				</p>
				<p>Opens a file for both appending and reading in binary format. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.</p>
			</td>
		</tr>
	</tbody>
</table>"""))



Sr.No.,Mode & Description
1,r  Opens a file for reading only. The file pointer is placed at the beginning of the file. This is the default mode.
2,rb  Opens a file for reading only in binary format. The file pointer is placed at the beginning of the file. This is the default mode.
3,r+  Opens a file for both reading and writing. The file pointer placed at the beginning of the file.
4,rb+  Opens a file for both reading and writing in binary format. The file pointer placed at the beginning of the file.
5,"w  Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing."
6,"wb  Opens a file for writing only in binary format. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing."
7,"w+  Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing."
8,"wb+  Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing."
9,"a  Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing."
10,"ab  Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing."


**The file Object Attributes**
Once a file is opened and you have one file object, you can get various information related to that file.

Here is a list of all the attributes related to a file object −

In [4]:
display(HTML("""<table class="table table-bordered">
<tbody><tr>
<th style="text-align:center;width:5%">Sr.No.</th>
<th style="text-align:center;">Attribute &amp; Description</th>
</tr>
<tr>
<td class="ts">1</td>
<td><p><b>file.closed</b></p>
<p>Returns true if file is closed, false otherwise.</p></td>
</tr>
<tr>
<td class="ts">2</td>
<td><p><b>file.mode</b></p>
<p>Returns access mode with which file was opened.</p></td>
</tr>
<tr>
<td class="ts">3</td>
<td><p><b>file.name</b></p>
<p>Returns name of the file.</p></td>
</tr>
</tbody></table>"""))

Sr.No.,Attribute & Description
1,"file.closed Returns true if file is closed, false otherwise."
2,file.mode Returns access mode with which file was opened.
3,file.name Returns name of the file.


In [11]:
# Open a file
fo = open("simple.txt", "wb")
print ("Name of the file: ", fo.name)
print ("Closed or not : ", fo.closed)
print ("Opening mode : ", fo.mode)
fo.close()

Name of the file:  simple.txt
Closed or not :  False
Opening mode :  wb


In [18]:
# Read a file
fo = open("simple.txt", "r+")
content =  fo.read()
print(content)
fo.close()

Sample Content.


Write example

In [14]:
# Open a file
fo = open("foo.txt", "w")
fo.write( "Python is a great language.\nYeah its great!!\n")

# Close opend file
fo.close()

A **regular expression** is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are widely used in UNIX world.

The module **re** provides full support for Perl-like regular expressions in Python. The **re** module raises the exception **re.error** if an error occurs while compiling or using a regular expression.

**Basic patterns that match single chars**

In [21]:
display(HTML("""<table class="table table-bordered">
<tbody><tr>
<th>Sr.No.</th>
<th style="text-align:center;">Expression &amp; Matches</th>
</tr>
<tr>
<td class="ts">1</td>
<td><p><b>a, X, 9, &lt;</b></p>
<p>ordinary characters just match themselves exactly.</p>
</td>
</tr>
<tr>
<td class="ts">2</td>
<td><p><b>. (a period)</b></p>
<p>matches any single character except newline '\n'</p>
</td>
</tr>
<tr>
<td class="ts">3</td>
<td><p><b>\w</b></p>
<p>matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_].</p>
</td>
</tr>
<tr>
<td class="ts">4</td>
<td><p><b>\W</b></p>
<p>matches any non-word character.</p>
</td>
</tr>
<tr>
<td class="ts">5</td>
<td><p><b>\b</b></p>
<p>boundary between word and non-word</p>
</td>
</tr>
<tr>
<td class="ts">6</td>
<td><p><b>\s</b></p>
<p>matches a single whitespace character -- space, newline, return, tab</p>
</td>
</tr>
<tr>
<td class="ts">7</td>
<td><p><b>\S</b></p>
<p>matches any non-whitespace character.</p>
</td>
</tr>
<tr>
<td class="ts">8</td>
<td><p><b>\t, \n, \r</b></p>
<p>tab, newline, return</p>
</td>
</tr>
<tr>
<td class="ts">9</td>
<td><p><b>\d</b></p>
<p>decimal digit [0-9]</p>
</td>
</tr>
<tr>
<td class="ts">10</td>
<td><p><b>^</b></p>
<p>matches start of the string</p>
</td>
</tr>
<tr>
<td class="ts">11</td>
<td><p><b>$</b></p>
<p>match the end of the string</p>
</td>
</tr>
<tr>
<td class="ts">12</td>
<td><p><b>\</b></p>
<p>inhibit the "specialness" of a character.</p>
</td>
</tr>
</tbody></table>"""))

Sr.No.,Expression & Matches
1,"a, X, 9, < ordinary characters just match themselves exactly."
2,. (a period) matches any single character except newline ' '
3,"\w matches a ""word"" character: a letter or digit or underbar [a-zA-Z0-9_]."
4,\W matches any non-word character.
5, boundary between word and non-word
6,"\s matches a single whitespace character -- space, newline, return, tab"
7,\S matches any non-whitespace character.
8,", , tab, newline, return"
9,\d decimal digit [0-9]
10,^ matches start of the string


**The match Function**
This function attempts to match RE pattern to string with optional flags.

Here is the syntax for this function −

In [24]:
display(HTML("""<table class="table table-bordered">
<tbody><tr>
<th>Sr.No.</th>
<th style="text-align:center;">Parameter &amp; Description</th>
</tr>
<tr>
<td class="ts">1</td>
<td><p><b>pattern</b></p>
<p>This is the regular expression to be matched.</p></td>
</tr>
<tr>
<td class="ts">2</td>
<td><p><b>string</b></p>
<p>This is the string, which would be searched to match the pattern at the beginning of string.</p></td>
</tr>
<tr>
<td class="ts">3</td>
<td><p><b>flags</b></p>
<p>You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below.</p></td>
</tr>
</tbody></table>"""))

Sr.No.,Parameter & Description
1,pattern This is the regular expression to be matched.
2,"string This is the string, which would be searched to match the pattern at the beginning of string."
3,"flags You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below."


In [25]:
import re

line = "Cats are smarter than dogs"

matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)

if matchObj:
   print ("matchObj.group() : ", matchObj.group())
   print ("matchObj.group(1) : ", matchObj.group(1))
   print ("matchObj.group(2) : ", matchObj.group(2))
else:
   print ("No match!!")

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter


In [26]:
import re

phone = "2004-959-559 # This is Phone Number"

# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print ("Phone Num : ", num)

# Remove anything other than digits
num = re.sub(r'\D', "", phone)    
print ("Phone Num : ", num)

Phone Num :  2004-959-559 
Phone Num :  2004959559


**Regular Expression Modifiers: Option Flags**
Regular expression literals may include an optional modifier to control various aspects of matching. The modifiers are specified as an optional flag. You can provide multiple modifiers using exclusive OR (|)

In [28]:
display(HTML("""<table class="table table-bordered">
<tbody><tr>
<th>Sr.No.</th>
<th style="text-align:center;">Modifier &amp; Description</th>
</tr>
<tr>
<td class="ts">1</td>
<td><p><b>re.I</b></p>
<p>Performs case-insensitive matching.</p></td>
</tr>
<tr>
<td class="ts">2</td>
<td><p><b>re.L</b></p>
<p>Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior (\b and \B).</p></td>
</tr>
<tr>
<td class="ts">3</td>
<td><p><b>re.M</b></p>
<p>Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).</p></td>
</tr>
<tr>
<td class="ts">4</td>
<td><p><b>re.S</b></p>
<p>Makes a period (dot) match any character, including a newline.</p></td>
</tr>
<tr>
<td class="ts">5</td>
<td><p><b>re.U</b></p>
<p>Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B.</p></td>
</tr>
<tr>
<td class="ts">6</td>
<td><p><b>re.X</b></p>
<p>Permits "cuter" regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker.</p></td>
</tr>
</tbody></table>"""))

Sr.No.,Modifier & Description
1,re.I Performs case-insensitive matching.
2,"re.L Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior ( and \B)."
3,re.M Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).
4,"re.S Makes a period (dot) match any character, including a newline."
5,"re.U Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, , \B."
6,"re.X Permits ""cuter"" regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker."


**Regular Expression Patterns**
Except for the control characters, **(+ ? . * ^ $ ( ) [ ] { } | \)**, all characters match themselves. You can escape a control character by preceding it with a backslash.

The following table lists the regular expression syntax that is available in Python 

Character classes

In [31]:
display(HTML("""<table class="table table-bordered">
<tbody><tr>
<th width="8%">Sr.No.</th>
<th style="text-align:center;">Example &amp; Description</th>
</tr>
<tr>
<td class="ts">1</td>
<td><p><b>[Pp]ython</b></p>
<p>Match "Python" or "python"</p></td>
</tr>
<tr>
<td class="ts">2</td>
<td><p><b>rub[ye]</b></p>
<p>Match "ruby" or "rube"</p></td>
</tr>
<tr>
<td class="ts">3</td>
<td><p><b>[aeiou]</b></p>
<p>Match any one lowercase vowel</p></td>
</tr>
<tr>
<td class="ts">4</td>
<td><p><b>[0-9]</b></p>
<p>Match any digit; same as [0123456789]</p></td>
</tr>
<tr>
<td class="ts">5</td>
<td><p><b>[a-z]</b></p>
<p>Match any lowercase ASCII letter</p></td>
</tr>
<tr>
<td class="ts">6</td>
<td><p><b>[A-Z]</b></p>
<p>Match any uppercase ASCII letter</p></td>
</tr>
<tr>
<td class="ts">7</td>
<td><p><b>[a-zA-Z0-9]</b></p>
<p>Match any of the above</p></td>
</tr>
<tr>
<td class="ts">8</td>
<td><p><b>[^aeiou]</b></p>
<p>Match anything other than a lowercase vowel</p></td>
</tr>
<tr>
<td class="ts">9</td>
<td><p><b>[^0-9]</b></p>
<p>Match anything other than a digit</p></td>
</tr>
</tbody></table>"""))

Sr.No.,Example & Description
1,"[Pp]ython Match ""Python"" or ""python"""
2,"rub[ye] Match ""ruby"" or ""rube"""
3,[aeiou] Match any one lowercase vowel
4,[0-9] Match any digit; same as [0123456789]
5,[a-z] Match any lowercase ASCII letter
6,[A-Z] Match any uppercase ASCII letter
7,[a-zA-Z0-9] Match any of the above
8,[^aeiou] Match anything other than a lowercase vowel
9,[^0-9] Match anything other than a digit


Special Character Classes

In [32]:
display(HTML("""<table class="table table-bordered">
<tbody><tr>
<th width="8%">Sr.No.</th>
<th style="text-align:center;">Example &amp; Description</th>
</tr>
<tr>
<td class="ts">1</td>
<td><p><b>.</b></p>
<p>Match any character except newline</p></td>
</tr>
<tr>
<td class="ts">2</td>
<td><p><b>\d</b></p>
<p>Match a digit: [0-9]</p></td>
</tr>
<tr>
<td class="ts">3</td>
<td><p><b>\D</b></p>
<p>Match a nondigit: [^0-9]</p></td>
</tr>
<tr>
<td class="ts">4</td>
<td><p><b>\s</b></p>
<p>Match a whitespace character: [ \t\r\n\f]</p></td>
</tr>
<tr>
<td class="ts">5</td>
<td><p><b>\S</b></p>
<p>Match nonwhitespace: [^ \t\r\n\f]</p></td>
</tr>
<tr>
<td class="ts">6</td>
<td><p><b>\w</b></p>
<p>Match a single word character: [A-Za-z0-9_]</p></td>
</tr>
<tr>
<td class="ts">7</td>
<td><p><b>\W</b></p>
<p>Match a nonword character: [^A-Za-z0-9_]</p></td>
</tr>
</tbody></table>"""))

Sr.No.,Example & Description
1,. Match any character except newline
2,\d Match a digit: [0-9]
3,\D Match a nondigit: [^0-9]
4,\s Match a whitespace character: [ ]
5,\S Match nonwhitespace: [^ ]
6,\w Match a single word character: [A-Za-z0-9_]
7,\W Match a nonword character: [^A-Za-z0-9_]


**Tasks:**

- Open text file read it's content.
- Create text file write new content.
- Open CVS file read it's content.
- Install Pandas and read text and CSV file using it.
- Build a Regex to match email.
- Build a Regex to find your name in a string.