### String Comparisons

We will examine now the concept of comparisons among strings and introduce a few comparison operators. 

#### Equality comparison

Let's first examine how we can check if two strings are identical. For this comparison, we need the equality operator `==`.

Let's see how equality comparisons work in Python:

In [None]:
str1 = "hello"

In [None]:
print(str1 == "hello")

In [None]:
print(str1 == "Hello")

Notice that **capitalization matters** when comparing strings in Python. If we want to make the comparison case-insensitive we typically first convert both sides of the equality to the same case:

In [None]:
print(str1.lower() == "Hello".lower() )

The opposite operator for equality is the inequality operator: `!=`. For example:

In [None]:
email1 = "profesor@nyu.edu"
email2 = "professor@stern.nyu.edu"
print("Are the emails different?", email1 != email2 )

** Exercise:** Correct `email1` to match `email2`, using string slicing and concatenation.

**Answer:**
<span style ="color:white"> email1 = email1[:6] + 's' + email1[6:9] + 'stern.' + email1[9:]

#### Ordering Strings

Strings also allow for inequality comparisons. When we compare strings, the string that is "smaller" is the one that comes first in the dictionary. Let's see an example: 

In [None]:
name1 = 'Abraham'
name2 = 'Bill'

# Abraham is lexicographically before Bill
print(name1 < name2)

In [None]:
name1 = 'William'
name2 = 'Bill'

# Panos is lexicographically after Bill
print(name1 < name2)

Notice though the following, where the capitalization of `Bill` changes:

In [None]:
name1 = 'William'
name2 = 'bill'

# Panos is lexicographically before bill
print(name1 < name2)

What causes this is the fact that the order is not simply the order in which we would encounter words in the dictionary. Technically, strings are ordered based on the order of the characters in the ASCII (or Unicode) table. Here is the ASCII table:

<img src = "https://www.asciitable.com/index/asciifull.gif">

For example, if we have the string below, and we try to sort them, take a look at the order:

In [None]:
# Space, followed by numbers, followed by uppercase, followed by lowercase
sorted(['Bill', '  ZZ TOP!!! ', 'HAHA', 'lol', 'LOL!', 'ZZZZZ', 'zzzzz', '123', '345'])

In [None]:
# Example of string comparison
# See ASCII table at http://www.asciitable.com/ for character order (FYI)

name1 = 'Abe'
name2 = 'Bill'

# Abe is lexicographically before Bill
print(name1 < name2)

name1 = 'abe'
name2 = 'Bill'
# However 'abe' is lexicographically after Bill (which starts with an uppercase letter)
print(name1 < name2)

### Finding text within string variables

####  `in` operator


+ The `in` operator, `needle in haystack`: reports if the string `needle` appears in the string `haystack`


For example, string "New York" appears within "New York University", so the following operator returns `True`:

In [None]:
"New York" in "New York University"

But, unlike reality, "New York University" is not in "New York" :-)

In [None]:
"New York University" in "New York"


####  `find` function

* The `find` function, `haystack.find(needle)`: searches `haystack` for `needle`, prints the position of the first occurrence, indexed from 0; returns -1 if not found.

For example:

In [None]:
word = "Python is the word. And on and on and on and on..." 
position = word.find("on") # The 'on' appears at the end of 'Python'
print(position)

In [None]:
print("The first time that we see the string on is at position", word.find("on"))

In [None]:
print(word.find("python"))

If we are looking to find additional appearances of the string, then we can add a second parameter in the `find` function, specifying that we are only interested in matches after the position specificed by the parameter.

In [None]:
first_appearance = word.find("on")
second_appearance = word.find("on",first_appearance+1)
print("The second time that we see the string on is at position", second_appearance)

##### Exercise

Consider the string billgates@microsoft.com. Write code that finds the username of the email address and the domain of the email address. You will need to use the .find() command, and also use your knowledge of indexing and slicing for this exercise. Hint: You will need to search for the `@` character using find, and then use the result to get the parts of the string before and after the `@` character. (Do not worry if this seems tedious, this is mainly for practice; later on, we will see how to do this in an easier way.)


**Answer:** <span style = "color:white"> 
split = email.find('@')
email[:split]
email[split+1:]

####  `count` function

+ `str_1.count(str_2)`: counts the number of occurrences of one string in another.

In [None]:
word = "Python is the word. And on and on and on and on..."
lookfor = "on"
count = word.count(lookfor)
print( "We see the string '", lookfor  ,"' that many times: ",  count)

In [None]:
word = "Python is the word. And on and on and on and on..."
lookfor = "Python"
count = word.count(lookfor)
print( "We see the string '", lookfor  ,"' that many times: ",  count)

Of course, notice that if capitalization is different, the matches will not "count".

In [None]:
word = "Python is the word. And on and on and on and on..."
lookfor = "PYTHON"
count = word.count(lookfor)
print( "We see the string '", lookfor  ,"' that many times: ",  count)

##### Exercise

Convert the code above so that it works in a case-insensitive manner. Use the `lower()` or `upper()` command.

**Answer:** <span style = "color:white"> word.upper().count(lookfor.upper())

##### Exercise

Consider the news article from [the New York Times](https://www.nytimes.com/2010/06/21/sports/soccer/21diving.html), which is given below, and stored in the string variable `article`.

* Count how many times the player `Keita` appears in the article. 
* Count how many times the player `Ozil` appears in the article. 
* Count how many times the player `Ronaldo` appears in the article. 
* Count how many times the player `Grosso` appears in the article. 
* Now sum up the occurrences and display the percentage of coverage for Keita. (For example, if Keita appears 2 times and each other player appears one time, then Keita has 40% of the coverage.)

In [None]:
article = """
JOHANNESBURG — The Ivory Coast forward cried out in apparent agony, 
covered his face with his hands and dropped to the turf with a thud
in the waning minutes of his team’s 3-1 loss to Brazil on Sunday at the World Cup.

The forward, Abdul Kader Keita, was not hit with the ball or slapped
across the face or punched, just bumped by the Brazilian star Kaka, 
who did little more than shrug, sticking his right elbow into Keita’s chest.
That was all it took for Keita to fall to the turf as if he had been 
doused with pepper spray.

The referee punished Kaka with a yellow card, his second of the game, 
forcing his ejection and leaving his team a man down for the rest of the game.

Many who saw the replay wondered whether Keita’s fall was the tournament’s 
latest example of what officials call simulation. Much of the flopping, flailing 
and falling in soccer is little more than diving to the turf in an effort 
to dupe the referee.

If successful, the diver could be awarded an unimpeded kick from the 
point of the infraction or, if it occurs in the penalty area in front 
of the goal, a penalty kick from 12 yards.

Fans are already seeing as much bad playacting as tricky dribbling during 
the World Cup in South Africa, despite efforts by FIFA, the sport’s world 
governing body, to punish divers. Some of the best players in the world crumple 
under imaginary contact to win a penalty, or writhe in seeming pain to run the 
clock down or give their teammates a breather.

“I wish it wasn’t part of the game,” said Paul Tamberino, the director of referee
development for U.S. Soccer. “Players will do whatever they can.”

In his first game of the tournament, the German midfielder Mesut Ozil tumbled as if 
gnomes hiding in the grass at Durban Stadium had tied his shoelaces together during 
his team’s opening game against Australia last week. An innocuous challenge from a 
defender did not seem enough to send him to the ground. Indeed, no foul was called. 
Instead, for his ruse, Ozil was punished with a yellow card.

Ozil’s tumble and Keita’s pantomime did not affect the outcome of either game; 
Germany and Brazil easily won their matches. But with so few games, and with goals 
at a premium at this World Cup (1.97 goals per game through the first 29 games, 
well below the low-water mark of 2.21 in 1990), an erroneously awarded penalty or
unjust suspension could prove decisive.

In the Round of 16 at the World Cup in Germany four years ago, Italy and Australia 
were tied, 0-0, in added time. Italy’s Fabio Grosso rushed into the Australia penalty 
area and doubled over the lunging defender Lucas Neill. Italy was awarded a penalty kick, 
converted it for a 1-0 victory and went on to win the title. The penalty kick eliminated
Australia.

“When he slid in, maybe I accentuated a little bit,” Grosso told Football Plus magazine 
in the spring. “I felt the contact, so I went down.”

According to FIFA’s Laws of the Game, “attempts to deceive the referee by feigning 
injury or pretending to have been fouled” are punishable by a yellow card. But it can 
be difficult at full speed, with only the naked eye and no video replay to consult, for a 
referee to spot the difference between a foul and a phantom.

The referee Koman Coulibaly whistled a foul in favor of United States forward Jozy Altidore 
in the 85th minute of the Americans’ 2-2 tie with Slovenia on Friday — awarding a free kick 
that led to a controversial disallowed goal — but replays showed that minimal contact occurred 
before Altidore crashed to the ground. Players know they are more likely to get away with it 
and tacitly condone the practice.

“Personally, I’m against any type of simulation; I don’t think it should be part of the game,” 
Alessandro Del Piero, a teammate of Grosso’s, said recently, before adding: “It went in Italy’s
favor and I was happy. If it went against us, I would be upset.”

Del Piero and his countrymen were indeed upset in 2002 when Italy was knocked out of the World 
Cup in South Korea by the host team after Francesco Totti was ejected for a second yellow card 
after disingenuously trying to earn a free kick.

Referees say that it is difficult to penalize a player for simulation because it is akin to 
calling him dishonest. “If you’re going to give a caution for simulation and there is contact,” 
Tamberino warned, “it has to be very obvious that he’s trying to cheat.”

Sometimes, simulation is so transparent that it more resembles vaudeville than world-class soccer.
In 2002, Rivaldo of Brazil was waiting to take a corner kick when Hakan Unsal of Turkey appeared
to deliberately kick the ball at him when he was not looking. Rivaldo went down as if he had been 
shot, clutching his face, right in front of an assistant referee. Unsal was given a red card, signaling 
an ejection, but Rivaldo was embarrassed when video replay revealed his reaction to be an act. 
FIFA later fined him $7,350.

“Whether it be the big flop or the big groan, sometimes it’s comical,” Tamberino said. 
“You don’t like to laugh, but sometimes, you give a smile.”

Referees must be close to the play and have the right angle and clear vision to make 
the correct call. It requires good positioning and communication between the referee 
and his assistants, one on each sideline. Three years ago, FIFA started a referee assistance 
program to train officials to, among other things, spot dives. Before the World Cup, FIFA’s 
technical study committee provided referees and assistant referees with scouting reports,
including video, that highlighted teams with a reputation for simulation.

Being a good diver may help a player get an occasional call, but when he earns a reputation 
as a cheat, it can be difficult to erase. Portugal’s Cristiano Ronaldo, one of the fastest and
most dangerous forwards, is often the target of deliberate fouls or the victim of overzealous 
defenders.

“Referees should protect the more skillful players when they are getting fouled by the opposition,” 
he said Tuesday after Portugal’s game with Ivory Coast. “Sometimes it is difficult for me when the
referees give fouls because they think I dive.”

But they think that for good reason. Ronaldo was notorious for feigning when he was a young player. 
He was lethal on direct kicks, and he relished the chance to show off his ability to strike the ball 
from long range. But now that he has developed into one of the game’s most hardened players, who is 
often hacked mercilessly by slower, less-skilled defenders, he does not get many calls.

“They don’t protect talented footballers anymore,” said Carlos Queiroz, the coach of Portugal. 
“I’d like to see if the rules are the same for everybody.”

Players and referees say they know which players go down too easily, but they are reluctant 
to identify them. “I’m anticipating all forwards are capable of doing that,” said the American 
defender Oguchi Onyewu, who at 6 feet 4 inches and 210 pounds is an easy mark for conniving 
forwards who exploit the perception that a bigger player fouls more often.

“I think referees are taking the proper measures now more so than in the past to eliminate 
that kind of exaggeration,” Onyewu said. “Giving the players themselves a card and not the defender.”

But defenders and forwards agree that diving will be hard to eradicate as long as players get away 
with it. “You try to get the ref on your side and you hope that he can see through all of that,” 
said Jonathan Spector, a defender for the United States. “There’s only so much you can control in a game. 
The call was made. Just get on with it.”
"""

In [None]:
# Calculate 'keita', the number of times that Keita appears in the text

**Answer:** <span style = "color:white">
keita = article.count("Keita")
print(keita)

In [None]:
# Calculate 'ozil', the number of times that Ozil appears in the text

**Answer:** <span style = "color:white">
ozil = article.count("Ozil")
print(ozil)

In [None]:
# Calculate 'grosso', the number of times that Grosso appears in the text

**Answer:** <span style = "color:white">
grosso = article.count("Grosso")
print(grosso)

In [None]:
# Calculate 'ronaldo', the number of times that Ronaldo appears in the text

**Answer:** <span style = "color:white">
ronaldo = article.count("Ronaldo")
print(ronaldo)

In [None]:
# Compute 'perc_keita', the percentage for Keita

**Answer:** <span style="color:white">
perc_keita = 100 * keita / (keita + ozil + grosso + ronaldo)
print(perc_keita)

In [None]:
# All together
perc_keita = round(perc_keita,2)
print("Keita appears", keita, "times:", perc_keita, "%")

#### `startswith` and `endswith` functions

Finally, we can also check if a particular string starts or ends with a another substring

+ `haystack.startswith(needle)`: does a the haystack string start with the needle string?
+ `haystack.endswith(needle)`: does a the haystack string end with the needle string?


In [None]:
name = "New York University"
prefix = "New York"
print( "Does ", name  ," start with",  prefix, "?")
print(name.startswith(prefix))

In [None]:
name = "New York University"
prefix = "University"
print( "Does ", name  ," start with",  prefix, "?")
print(name.startswith(prefix))

In [None]:
name = "New York University"
suffix = "University"
print( "Does ", name  ," end with",  suffix, "?")
print(name.endswith(suffix))