# Matlab Data Structure

1. Character Arrays
2. Formatting Strings 
3. Cell Arrays
4. String Matching
5. Worked example of string processing
6. Set operations
7. Struct
8. Struct arrays
9. Containers.map

## 1. Character arrays

- Strings in Matlab are actually character matrices, but can be manipulated in very similar ways to numeric matrices

In [14]:
A = 'Hello World'
B = A(1:5) # Access the first 5 characters
C = [A; A]
D = 'z':-1:'a' # create a matrix from z to a, decremented by 1
check = ischar(A)
E = repmat('aiden ',2,5) # replicate a string, two rows, 5 columns
F = isletter(A(1:10)) # check which characters are letters
G = isspace(A(1:10)) # check which characters are spaces
H = upper(A) # convert htem all to upper case
I = lower(A) % convert them all to lower case
J = "     Hello World      "
K = strtrim(J) # this will trim leading and trailing blank spaces
L = deblank(J) # this will trim only the trailing blank spaces
M = repmat(L,2,2)
N = repmat(K,2,2)

A = Hello World
B = Hello
C =

Hello World
Hello World

D = zyxwvutsrqponmlkjihgfedcba
check =  1
E =

aiden aiden aiden aiden aiden 
aiden aiden aiden aiden aiden 

F =

   1   1   1   1   1   0   1   1   1   1

G =

   0   0   0   0   0   1   0   0   0   0

H = HELLO WORLD
I = hello world
J =      Hello World      
K = Hello World
L =      Hello World
M =

     Hello World     Hello World
     Hello World     Hello World

N =

Hello WorldHello World
Hello WorldHello World



- ** isstrprop(string, 'punct')** can be used to check for punctuation, returns a logical array
- ** isstrprop(string, 'alphanum') ** can be used to check for alpha numeric characters
- ** isstrprop(string, 'digit')** decimal degits
- ** isstrprop(string, 'xdigit')** check for valid hexadecimal digits

In [19]:
str = '   a1!'
A = isstrprop(str,'punct') % check for punctuation
B = isstrprop(str,'alphanum')
C = isstrprop(str,'digit')
D = isstrprop('1F4A','xdigit')
E = isspace(str)
F = isletter(str)

str =    a1!
A =

   0   0   0   0   0   1

B =

   0   0   0   1   1   0

C =

   0   0   0   0   1   0

D =

   1   1   1   1

E =

   1   1   1   0   0   0

F =

   0   0   0   1   0   0



In [21]:
% convert integers to the ASCII 
% from characters to ASCII, use char()
% from ASCII to integer, use abs()
A = char(65)
B = abs('B')
C = abs('abcdeG3')

A = A
B =  66
C =

    97    98    99   100   101    71    51



To convert from string representations of hexadecimal or binary numbers to decimal numbers and back using
- dec2hex()
- hex2dec()
- dec2bin()
- bin2dec()

To generate string representations of numeric matrices, use:
- num2str()
- mat2str()
- str2num()

In [25]:
A = dec2hex(211)
B = hex2dec('D3')
C = dec2bin(211)
D = bin2dec('11010011')

A = D3
B =  211
C = 11010011
D =  211


In [28]:
E = num2str([1:5;1:5])
F = mat2str([1:5;1:5])
c = str2num('44')

E =

1  2  3  4  5
1  2  3  4  5

F = [1 2 3 4 5;1 2 3 4 5]
c =  44


In [31]:
# concatenate vertically
a = strvcat('hello','world','this', 'is','cs50')
# sort an array
b = sortrows(a)
# justify the char array
c = strjust(a)

a =

hello
world
this 
is   
cs50 

b =

cs50 
hello
is   
this 
world

c =

hello
world
 this
   is
 cs50



## 2. Formating strings

- sprintf() -> this returns a string
- fprintf() -> this directly display a string, or write it to a file, depending on the mode.
- these two functions can be used to format strings for output.
- we use % to denote that it is a place holder, %s: string, %05.2f denotes there will be 5 characters in total and 2 degits after the decimal, %x denotes hexadecimal, etc
- to see the full list, use ** doc sprintf **

In [38]:
fprintf('%s is the only women who won the %s award when she was only %i years old', 'Elizabeth Holmes', 'Horatio Alger Award', 21);
str = sprintf('%07.4f',pi) # this indicate that we want 7 characters in total, and 4 digits after the decimal point.
str = sprintf('%x',999) # %x means it would display the number in hexadecimal.

Elizabeth Holmes is the only women who won the Horatio Alger Award award when she was only 21 years old
str = 03.1416
str = 3e7
Elizabeth Holmes is the only women who won the Horatio Alger Award award when she was only 21 years old


## 3. Cell Arrays:
- Matlab supports a very general but powerful data structure called the cell array. It can hold any type of Matlab object or structure including numeric matrices of different sizes, character arrays, other cells, as well as structs and objects.
- Cell arrays are enclosed by curly braces { } 

In [39]:
A = cell(2,4) # create a 2 by 4 cell array
check = iscell (A)

A = 
{
  [1,1] = [](0x0)
  [2,1] = [](0x0)
  [1,2] = [](0x0)
  [2,2] = [](0x0)
  [1,3] = [](0x0)
  [2,3] = [](0x0)
  [1,4] = [](0x0)
  [2,4] = [](0x0)
}
check =  1


In [42]:
B = { [1,2,3], 'hello', {1};  [3;5],'yes',{'no'}} % the 1 and 'no' are cell array inside a cell array

B = 
{
  [1,1] =

     1   2   3

  [2,1] =

     3
     5

  [1,2] = hello
  [2,2] = yes
  [1,3] = 
  {
    [1,1] =  1
  }
  [2,3] = 
  {
    [1,1] = no
  }
}


In [50]:
C = B(1,2) # to get the element on the first row and second column
size(C)
class(C)
D = B{1,2} # this returns the string itself
E = B{2,3}
F = B{1,1}
class(D)
class(E)
G = B(:,1) # returns a cell array holding the first column 

C = 
{
  [1,1] = hello
}
ans =

   1   1

ans = cell
D = hello
E = 
{
  [1,1] = no
}
F =

   1   2   3

ans = char
ans = cell
G = 
{
  [1,1] =

     1   2   3

  [2,1] =

     3
     5

}


## 4. String Matching /Comparison

1. **strcmp(A,B)** compare strings A and B
2. **strcmpi(A,B)** compare String A and B and ignore cases.
3. **strncmp(A,B,numChar)**: compare the first few chars of two string, specify by numChar
4. **strncmpi(A,B,numChar) **: same as above but ignore cases

In [2]:
A = 'testString'
test1 = strcmp(A,'testString')
test2 = strcmpi(A,'TESTSTRING') % compare two strings but ignore cases
test3 = strncmp(A,'tesTString',4) 
test4 = strncmpi(A,'TesTString',6)

A = testString
test1 =  1
test2 =  1
test3 = 0
test4 =  1


In [5]:
% use strfind() to find the occurance of one substring inside another 
str = 'actiohanlfeaoilaactiohanlfeaoilaactiohanlfeaoilaactiohanlfeaoila'
A = strfind(str,'nlfe') # this 'nlfe' occurs 4 times in the string, so it would return a vector of all four of the occurance

str = actiohanlfeaoilaactiohanlfeaoilaactiohanlfeaoilaactiohanlfeaoila
A =

    8   24   40   56



In [9]:
str = {'foobar','bar','ffoo','barfo2o','foofoo'}
B = strmatch('foo',str)

str = 
{
  [1,1] = foobar
  [1,2] = bar
  [1,3] = ffoo
  [1,4] = barfo2o
  [1,5] = foofoo
}
B =

   1   5



In [12]:
# strtok() can be used to grab the first token in a char array delimited by spaces
[token,remaining] = strtok('this is a test')
[token,remaining] = strtok(remaining)

token = this
remaining =  is a test
token = is
remaining =  a test


## 5. Set Operations
Unlike Python, we could treat all matrices and cell arrays as sets and perform various set operations

- union()
- intersect()
- setdiff()
- setxor()
- ismember()

In [18]:
set1 = 1:2:9
set2 = 1:4
inters = intersect(set1,set2)
uni = union(set1, set2)
dif = setdiff(set1, set2) # returns a sorted order of an array that has data in set1 but not in set2
diff = setdiff(set2,set1)
xor = setxor (set1, set2) # returns a sorted order of an array that is in set1 and set2 but not in both
check = ismember(4,set1)
check = ismember(4,set2)

set1 =

   1   3   5   7   9

set2 =

   1   2   3   4

inters =

   1   3

uni =

   1   2   3   4   5   7   9

dif =

   5   7   9

diff =

   2   4

xor =

   2   4   5   7   9

check = 0
check =  1


In [20]:
# we can extract the unique elements of a cell array or a matrix using the unique() function

uniqueNumbers = unique([1,2,3,4,3,2,1,4,5,6,1,2,3,4,12,4,5]) # returns a sorted array
uniqueNames = unique({'Aiden','Bob','Cindy','Bob','Aiden','David','cindy'})

uniqueNumbers =

    1    2    3    4    5    6   12

uniqueNames = 
{
  [1,1] = Aiden
  [1,2] = Bob
  [1,3] = Cindy
  [1,4] = David
  [1,5] = cindy
}


In [24]:
A = ['bba';'bab';perms('aba');'aba'] % the perms() function generates every permutations
[uniqueElems,firstIndices,perm] = unique (A,'rows')
sorted = issorted(uniqueElems,'rows')
check = isequal(A,uniqueElems(perm,:),A(firstIndices(perm),:))

A =

bba
bab
aba
baa
aab
baa
aab
aba
aba

uniqueElems =

aab
aba
baa
bab
bba

firstIndices =

   7
   9
   6
   2
   1

perm =

   5
   4
   2
   3
   1
   3
   1
   2
   2

sorted =  1
check =  1


## 6. Structs: 

- struct allows you to organize data and access it by names. 
- **struct are basically hashmaps.**
-  but depending on how they are used, they could also operate like a database
- you can store structs in cell arrays and even within matrices so long as the fieldnames of the structs are the same

In [25]:
S = struct('name','aiden','age',20,'height',1:0.5:2)
check = isstruct(S)
names = fieldnames(S)
check2 = isfield(S,'age')
S = orderfields(S) # order the fields alphebetically
S = rmfield(S, 'height') # remove a field

S =

  scalar structure containing the fields:

    name = aiden
    age =  20
    height =

        1.0000    1.5000    2.0000


check =  1
names = 
{
  [1,1] = name
  [2,1] = age
  [3,1] = height
}
check2 =  1
S =

  scalar structure containing the fields:

    age =  20
    height =

        1.0000    1.5000    2.0000

    name = aiden

S =

  scalar structure containing the fields:

    age =  20
    name = aiden



In [29]:
# use .operator and the name of the field, so like
name = S.name
age = S.age
height = S.height

#you could also access the field using a string
name = S.('name')
age = S.('age')

name = aiden
age =  20
error: structure has no member 'height'
name = aiden
age =  20


In [32]:
# to set a new value for the field
S.name = 'Bob'
S.age = 90
#and add a new field and data 
S.pupularity = 'tunnel'
S.gender = 1

S =

  scalar structure containing the fields:

    age =  90
    name = Bob

S =

  scalar structure containing the fields:

    age =  90
    name = Bob

S =

  scalar structure containing the fields:

    age =  90
    name = Bob
    pupularity = tunnel

S =

  scalar structure containing the fields:

    age =  90
    name = Bob
    pupularity = tunnel
    gender =  1



** Notes: **

- when the names for the fields are generated dynamically, i.e. at runtime, it is often prudent to ensure that the string is a valid fieldname. 
- Filednames must begin with a *letter and can contain only letters, numbers and the underscore symbol*
- to check if the string is valid, use **isvarname()** and autogenerate a valid name from a source string with **genvarname()** command.

In [35]:
test = isvarname('aiden')
test = isvarname('12a')
better = genvarname('~?!')

test =  1
test = 0
better = x___


## 7. Struct arrays:

- we can create an array of structs all having the same fieldname, which allows us to build a kind of database of entries.

In [40]:
S = struct('name',{},'id',{},'email',{})

S =

  0x0 struct array containing the fields:

    name
    id
    email



In [49]:
# how you add to the database
S(1).name ='Aiden'; S(1).('id') =1122;S(1).email = 'aiden@example.com';
S(2).name = 'Bob'; S(2).id =11444; S(2).email = 'bob@tunnel.com';
S(3).name = 'Cindy'; S(3).id =420; S(3).email = 'cindy@cambridge.com';

In [50]:
# how you access individual record, itself a struct
AidenRecord = S(1)
BobRecord = S(2)

AidenRecord =

  scalar structure containing the fields:

    name = Aiden
    id =  1122
    email = aiden@example.com

BobRecord =

  scalar structure containing the fields:

    name = Bob
    id =  11444
    email = bob@tunnel.com



In [51]:
# or access all at once 
[one,two,three] = S.id
[name1,name2,name3] = S.name

one =  1122
two =  11444
three =  420
name1 = Aiden
name2 = Bob
name3 = Cindy


In [56]:
 # you can concatnate the output from the above command
 A = [S.id]
 # sometimes you just want to concatnate them vertically, stacking the matrices so it's a standing up vector
 B = vertcat(S.id)

A =

    1122   11444     420

B =

    1122
   11444
     420



**Notes**
- we can create structs from cell arrays using the  **cell2struct()** function and vice versa using the **struct2cell()** function

In [59]:
data = {1,2,3,4}
fieldname = {'one','two','three','four'}
dim = 2;
S = cell2struct (data,fieldname,dim)


data = 
{
  [1,1] =  1
  [1,2] =  2
  [1,3] =  3
  [1,4] =  4
}
fieldname = 
{
  [1,1] = one
  [1,2] = two
  [1,3] = three
  [1,4] = four
}
S =

  scalar structure containing the fields:

    one =  1
    two =  2
    three =  3
    four =  4



error: 'containers' undefined near line 1 column 14
